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L  INTRODUCTION 


The  first  satellite  motion  model  used  by  the  Air  Force  Space  Command 
(AFSPACECOM)  is  known  as  the  Simplified  General  Perturbations  (SGP)  satellite 
motion  model.  SGP  is  an  analytic  model  based  on  the  work  of  Kozai  (1959)  and 
Brouwer  (1959),  made  operational  by  Hilton  and  Kuhlman  (1966).  It  is  implemented  by 
a  Fortran  subroutine,  SGP,  and  is  used  to  track  near-Earth  satellites  (period  less  than  225 
minutes)  having  orbits  with  low  eccentricities.  By  1976  a  second  satellite  motion  model 
for  near-Earth  satellites  was  implemented  by  a  Fortran  subroutine,  SGP4,  replacing  SGP 
as  the  operational  satellite  motion  model  used  by  the  AFSPACECOM.  SGP4  is  an 
analytic  satellite  motion  model  based  on  the  theory  developed  by  Lane  and  Cranford 
(1969)  which  uses  the  solution  of  Brouwer  (1959).  To  meet  the  needs  for  deep-space 
satellites  another  model,  SDP4,  was  made  operational.  SDP4  is  an  extension  of  SGP4 
where  the  deep-space  theory  is  derived  from  Hujsak  (1979).  Currently,  SGP4  refers  to 
one  Fortran  subroutine  that  has  combined  the  former  separate  subroutines  SGP4  and 
SDP4,  and  is  what  the  AFSPACECOM  now  uses  to  generate  the  North  American 
Aerospace  Defense  and  Command  (NORAD)  element  sets. 

The  AFSPACECOM  currently  tracks  approximately  7000  space  objects  on  a  daily 
basis.  As  space  technology  progresses  and  as  the  dependence  upon  applications  in  space 
increases,  the  number  of  space  objects  that  require  tracking  will  continue  to  grow.  The 
future  use  of  operational  satellite  motion  models  used  by  the  AFSPACECOM  need  to 
involve  shorter  computational  time  to  obtain  near  real  time  orbit  predictions.  Also,  if  the 
AFSPACECOM  chooses  to  develop  models  with  greater  accuracy,  these  models  will 


undoubtedly  require  even  more  computational  time.  Such  accurate  models  can  be  used 
to  predict  orbits  of  smaller  objects,  increasing  the  number  of  objects  by  ten  fold. 

Computational  time  in  implementing  a  satellite  motion  model  is  highly  dependent 
upon  the  computer  configuration  used.  Parallel  computing  has  the  potential  to 
significantly  decrease  computational  time  over  serial  computing  currently  used  by  the 
AFSPACECOM.  Use  of  parallel  computers  has  already  proven  to  be  beneficial  in 
reducing  computation  time  in  many  other  applications.  Once  a  decision  is  made  to 
convert  to  parallel  computing,  future  satellite  motion  models  can  be  developed  in  a 
parallel  architecture  format.  Ideally,  these  models  made  operational  with  the  use  of 
parallel  computers  will  result  in  computationally  efficient  programs  that  yield  highly 
accurate  results. 

This  thesis  investigates  the  parallel  computing  potential  of  the  AFSPACECOM's 
first  operational  satellite  motion  model,  SGP,  and  the  current  satellite  motion  models, 
SGP4  and  SDP4,  using  the  Intel  iPSC/2  hypercube  multicomputer.  The  SGP  model  was 
included  since  it  is  still  used  at  several  surveillance  sensor  sites.  The  following  chapter 
provides  a  description  of  the  SGP  model  and  outlines  the  algorithm  used  by  the  Fortran 
subroutine,  SGP.  Chapter  HI  gives  a  general  description  of  the  SGP4  and  SDP4  models 
and  how  they  compare  with  the  SGP  model.  Chapter  IV  presents  an  overview  of  parallel 
processing  and  discusses  the  methods  to  parallelize  a  serial  algorithm  to  be  applied  to  the 
hypercube.  In  Chapter  V,  the  parallelization  of  SGP,  SGP4,  and  SDP4  are  presented 
with  their  respective  success  in  reducing  computation  time.  Here  the  effects  of  reading 
and  writing  to  and  from  the  disk  are  not  included.  A  comparison  between  the 
parallelized  versions  of  SGP,  SGP4,  SDP4  and  PPT2  is  also  included.  PPT2  is  an 
analytic  satellite  motion  model  used  by  the  Naval  Space  Surveillance  Center 
(NAVSPASUR),  and  a  parallelized  version  was  created  by  CPT  Warren  E.  Phipps 
(1992)  at  the  Naval  Postgraduate  School  using  the  same  multicomputer,  Intel  iPSC/2 


hypercube.  Chapter  VI  further  investigates  the  parallel  computing  potential  of  SGP4  and 
SDP4  by  considering  the  effects  of  increased  computation  time  and  the  effect  of  reading 
and  writing  to  and  from  the  disk.  The  last  chapter  of  this  thesis  provides  conclusions  and 
suggestions  for  future  research. 


IL  SIMPLIFIED  GENERAL  PERTURBATIONS  (SGP)  MODEL 


A.  OVERVIEW 

The  Simplified  General  Perturbations  model  is  the  original  model  used  by  the 
AFSPACECOM  to  track  artificial  satellites  orbiting  the  earth,  implemented  by  the 
subroutine  SGP.  Given  pseudo  elements  generated  by  the  AFSPACECOM,  SGP  users 
can  predict  the  state  vector  (position  and  velocity)  at  a  future  time.  This  model  considers 
perturbing  accelerations  due  to  the  oblateness  of  the  Earth,  asymmetry  of  the  Earth's 
mass  about  the  equatorial  plane,  and  atmospheric  drag.  This  model  does  not  include 
perturbation  effects  due  to  longitudinal  variation  in  the  Earth’s  gravitational  potential,  or 
due  to  other  celestial  bodies  such  as  the  moon  or  the  sun. 

Satellite  motion  models  can  be  classified  according  to  how  the  equations  of  motion 
are  solved.  The  two  general  classifications  are  known  as  general  perturbations  and 
special  perturbations.  General  perturbation  models  involve  solving  the  equations  of 
motion  analytically.  As  more  perturbations  are  included  in  a  satellite  motion  model  to 
increase  accuracy,  analytical  solutions  become  increasingly  complicated  if  not  impossible 
to  solve.  On  the  other  hand,  it  is  always  possible  to  use  special  perturbations.  Special 
perturbation  models  involve  solving  the  equations  of  motion  through  numerical 
integration.  Although  special  perturbation  models  are  generally  more  accurate  and  less 
complicated  than  general  perturbation  models,  they  are  computationally  expensive.  A 
third  model  type  known  as  a  semi-analytic  model  combines  the  general  and  special 
perturbation  techniques  in  an  attempt  to  balance  the  advantages  and  disadvantages  of  the 
two  techniques,  resulting  in  a  highly  accurate  and  computationally  efficient  model.  The 
SGP  model  is  an  example  of  a  general  perturbations  model. 


Satellite  motion  models  can  also  be  classified  according  to  how  the  variation  in  the 
satellite’s  motion  is  represented.  The  two  general  classifications  are  known  as  variation 
of  dements  and  variation  of  coordinates.  Variation  of  elements  identifies  variations  in 
terms  of  changes  in  the  osculating  elements  with  respect  to  time.  Variation  of 
coordinates  involves  choosing  a  coordinate  system  and  describing  the  changes  in  position 
and  velocity  with  respect  to  time  in  that  coordinate  system.  The  SGP  model  uses 
variation  of  elements  and  describes  the  changes  in  the  classical  orbital  elements  with 
respect  to  time. 

B.  THEORY 

The  Simplified  General  Perturbations  (SGP)  model  is  an  analytic  model  developed 
by  Hilton  and  Kuhlman  (1966).  SGP's  gravitational  submodel  is  a  simplification  of  the 
work  done  by  Kozai  (1959)  and  Brouwer  (1959).  The  atmospheric  drag  submodel 
describes  the  mean  motion  as  a  linear  function  of  time. 

Before  proceeding  with  the  theoretical  discussion  of  these  models,  a  brief 
explanation  of  how  SGP  defines  satellite  motion  is  needed.  The  variation  in  the  motion 
due  to  the  perturbation  effects  is  described  by  the  changes  in  the  classical  orbital 
elements  with  respect  to  time.  The  six  classical  elements  are  n,  e,  i,  Q,  to,  and  M  (see 
Figure  2.1),  where 

n  =  mean  motion 
e  =  eccentricity 

i  =  inclination  of  the  orbital  plane  to  the  equator 
=  right  ascension  of  the  ascending  node 
©  =  argument  of  perigee 
M  =  mean  anomaly  at  epoch. 

The  position  and  velocity  vectors  can  be  obtained  from  these  six  orbital  elements. 
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Figure  2.1  Classical  Orbital  Elements 


XE  represents  the  eccentric  anomaly  and  is  related  to  the  mean  anomaly  by, 

E-es\nE  =  M. 
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1.  Kozai's  Gravitational  Model 

Here  a  simplification  of  Kozai's  model  is  presented  (Roy,  1988,  pp.  312-320). 
Kozai's  model  is  more  involved  than  the  classical  Keplerian  model  for  idealized  satellite 
motion.  In  the  classical  model  the  Earth's  potential  at  an  external  point,  a  distance  r 
from  its  center,  is  expressed  by 

U  =  ^-  (2.1) 

r 

where  p  is  the  gravitational  parameter,  equal  to  the  product  of  the  Earth's  mass  and  the 
gravitational  constant  G.  Thus,  a  satellite  of  mass  m  at  a  distance  r  possesses  a  potential 
energy  equal  to 


-mU  =  -m^. 

r 

Equation  2.1  represent  the  potential  for  a  perfectly  spherical  Earth.  In  Kozai’s 
model,  the  Earth  is  considered  to  be  a  planet  that  is  symmetric  about  the  spin  axis  and 
asymmetric  about  the  equatorial  plane  .  Thus,  the  Earth's  potential  may  be  written  as 


P„(s  in  8) 


(2.2) 


where  Jn  are  constants,  R  is  the  Earth's  equatorial  radius,  8  is  the  declination,  and 
PH  (sin  8)  is  the  Legendre  Polynomial  of  order  n  in  sin  8. 

The  Earth's  potential  can  be  written  in  terms  of  a  disturbing  function  F 
U  =  U0+F 

where  U0  represents  the  classical  potential.  Thus  the  disturbing  function  can  be  written 
as 

F  =  (/-£/„  P„(sin5).  (2.3) 

r  r  ^ 2  \r J 
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Now,  J2  is  of  the  order  10"\  and  J3,JA,...  are  of  order  10"6  or  less.  Since 
are  relatively  insignificant  and  the  j  factor  decreases  as  n  increases, 

further  analysis  will  only  include  J2  and  /3,  the  second  and  third  harmonics,  in 
agreement  with  the  SGP  model.  Thus,  the  disturbing  function  can  be  written  as 


F  =  -£ 


'*173 


,  -sin 


Krj 

Substituting  the  known  relations, 


in25_i]  +  73^Jgsini5_l]sin6' 


(2.4) 


a(l-e2) 

1+ecosu 


and 


sinS  =  sinisin(D  +  m), 

where  u  is  the  true  anomaly  and  5  is  the  declination ,  as  illustrated  in  Figure  2.1,  F 
becomes 


-  -  -  sin2  i  +  -  sin2  i  cos  2(u  +  to) 

2  a  UJ 

.3  2  2 

(2.5) 


73/?3 


l^S*n2,-^JS*n^  +  0)"~  ■^sin2/sm3(u  +  co)) 


sin  i 


Next,  the  disturbing  function  is  separated  into  first-order  secular,  second-order 
secular,  long-period,  and  short-period  terms.  These  parts  are  referred  to  as  Fj ,  F2 ,  F3  ,and 
F4  respectively.  Now,  F  is  periodic  with  respect  to  the  true  anomaly  v  and  the  radial 
distance  r .  The  true  anomaly  v  and  the  mean  anomaly  M  are  related  by 


dx> 

dM 


(2.6) 
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Since  the  quantities  -  and  t)  are  functions  of  e  and  M  only,  F  is  periodic  with  respect 
a 

to  M.  Terms  in  F  depending  on  M  and  not  on  to  are  short-period.  Terms  in 
F  depending  on  to  but  not  on  M  are  long-period.  The  resulting  terms  depending  neither 
on  M  nor  on  to  are  secular.  Thus,  the  terms  of  F  are  sorted  in  the  following  way: 


Ft:  These  terms  are  obtained  by  averaging  with  respect  to  M  those  parts  of  the  first 
portion  (i.e.,  J2  part)  of  F  that  are  independent  of  M  and  co  . 

F2:  These  terms  would  be  obtained  by  averaging  with  respect  to  M  those  parts  of  the 
second  portion  (i.e.,  /3  part)  of  F  that  are  independent  of  M  and  0),  but  since  there  are 
no  such  terms  F2  is  zero. 

F3:  These  terms  are  obtained  by  taking  the  mean  value  of  the  second  portion  of  F 
with  respect  to  M. 

F4;  These  terms  are  obtained  by  subtracting  the  first-order  secular  terms  from  the  first 
portion  of  F. 

The  mean  values  are  derived  by 


<2  = 


where  Q  represents  any  term  in  F  that  is  being  averaged,  and  the  equations. 


sin  3-u  =  0, 


(2.7) 


by  Tisserand  (1889)  are  used  to  obtain  F} ,  F2 ,  F3 ,  and  F4.  The  resulting  equations  are 


(2.8) 


» 


*i  = 


_  win  1.2 


- sin  i 


2  a3  13  2 


F2  =0 

„  3pJ2F3  .  / 
F,  =-i— i- — sun 

3  2  a4  v 


i  5  •  2'i 

1 — sin  i  e 
4 


(w) 


5 

2  sinco 


_  3  \U2R2  fflV 


2  a3  Vr) 


1  1.2. 
---sin  i 
1 3  2 


1  •  2- 
+— sin  i  cos 
2 


2(d  +  co)| 


2.  SGP's  Gravitational  Model 


This  model  follows  directly  from  the  theory  discussed  in  part  one.  It  will  be 
presented  based  on  Project  Space  Track  Report  No.  3  (Hoots  and  Roehrich,  1980), 
maintaining  a  parallel  structure  between  the  equations  and  the  computer  code, 
a.  Calculating  Initial  Constants 

Predictions  are  made  by  first  calculating  the  constants  below: 

(1)  Semi-major  axis.  This  is  expressed  by 


a,  = 


k 

*o) 


2 


(2.9) 


where 

al  =  semimajor  axis 

K  =V? 

n0  =  SGP  type  "mean"  mean  motion  at  epoch. 

(2)  Deltal  (8j).  This  is  an  intermediate  calculation  in  modifying  the  semi¬ 
major  axis  to  include  perturbation  effects  due  to  the  oblateness  of  the  Earth.  Deltal  is 
expressed  by 

(2.10) 


3  .  Of  (3 cos2 10  -l) 
4' 


5:=T^”2 


<h 


(I-.’)5 


> 
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where 


aE  =  equatorial  radius  of  Earth 
i0  ="  mean"  inclination  at  epoch 
e0  -  'mean"  eccentricity  at  epoch. 

Deltal  will  be  revealed  in  the  next  part 
(3)  Modified  Semi-major  Axis  (a0).  This  is  expressed  by 


an  =a 


-i  1  o  p  2  1 34  n  3 

1 — o,  —o, - o, 

3  1  1  81  1 


Equation  2.1 1  is  derived  in  the  following  manner: 

By  substituting  Ei  for  F  in  the  differential  equation  (Roy,  1988,  p.  316) 


dM0  _  l-ef  dF _ 2_9 F 

dt  0  n0a02e0  de0  n0a0  da 


the  mean  "mean"  anomaly  at  epoch  M0  is  obtained. 


M0  =  M0+n0t 


where  Ma  is  the  unperturbed  mean  anomaly,  with 


no  =r.„ 


I+tA - ^ - 5-(3cos!i.-l) 


where  ha  is  the  unperturbed  mean  motion.  For  convenience  let 

3  -  2 


£*=4^2 — ~ — r(3cos2*V"i)- 

4  F 


Now,  a0  is  chosen  by  convention  to  be 

a0=d0(l-Da0~2) 


where  da  is  tne  unperturbed  semi-major  axis.  Thus,  using  the  relation 


leads  to  the  equation 


a0\2  =  K2  (i  -  Dao'2 )  t1 + Dao~2 ) 


Therefore, 


•  W) 

Equation  2.12  defines  a0  implicitly.  This  equation  is  .solved  iteratively  by  using  Picard’s 
method,  where  Oj  (see  Equation  2.9)  is  used  as  the  initial  approximation  (Office  of 
Astrodynamic  Applications  HQ  Fourteenth  Aerospace  Force,  1974,  pp.  1-6),  resulting  in 
Equation  2.11.  It  should  be  noted  that  in  practice,  nu  is  obtained  through  a  differential 


2  2 

(l-ZV2)(l  +  Da0'?)?  ^(l-ZV2)(l  +  ZV2)5-  (2A2) 


correction  process  on  a  separate  computer  code  (i.e.,  the  main  program  DRIVER  reads 
this  quantity  as  part  of  the  NORAD  2-line  element  set),  and  is  accurate  .o  the  third  order 
(in  J2),  thus  a0  is  calculated  to  the  same  accuracy. 

(4)  Semilatus  Rectum  (pa).  This  is  expressed  by 

!>.=«.( !-«.’)  (2-13) 


The  semilatus  rectum  is  the  length  of  the  chord  from  the  force  center  to  the  ellipse, 
parallel  to  the  minor  axis  bb’,  as  illustrated  in  Figure  2.2. 

(5)  Radius  at  Perigee  (qa).  This  is  expressed  by  (see  Figure  2.2) 

4=  (1-0-  (2-W) 

(6)  Mean  "Mean"  Longitude  at  Epoch  ( L0 ).  This  is  the  sum  of  the  "mean” 
mean  anomaly  at  epoch  (M0),  the  "mean"  argument  of  perigee  at  epoch  (coo),  and  the 
"mean"  longitude  of  ascending  node  at  epoch.  Thus,  L0  can  be  written  as 

L0  =  M0+(i)o+Q0.  (2.15) 


(7)  Change  in  the  "mean"  longitude  of  ascending  node  with  respect  to 


tunc 


dQ 
.  dt 


The  equation  for  —  can  be  derived  from  the  relation  (Roy,  1988,  p.316) 
dt 
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> 


» 


» 


t 


> 
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dQ_  = _ 

dt  n0a? 


3F 


ea2  sini i0 


by  substituting  F,  from  Equation  2.8  for  F  in  Equation  2.16  yielding 


dCl  3  y  Gg 


dt 


-  2  7no  C0Slo< 

2  Po 


(8)  Change  in  the  "mean"  argument  of  perigee  with  respect  to  time 


The  equation  for  —  can  be  derived  from  the  relation  (Roy,  1988,  p.316) 
dt 


d(S) 

dt 


C0SI„ 


dF  .  fi-e?  dF 
n0aa2 -Jb^eJ sini0  ^  noa?e,^e° 


by  substituting  Fx  from  Equation  2.8  for  F  yielding 


i£=2,,£S;n„(5cos’i„-i). 
dt  4  n  ' 


b.  Secular  Effects  of  Atmospheric  Drag  and  Gravitation 

The  atmospheric  drag  is  taken  to  effect  mean  motion  linearly  with  time, 
resulting  ill  a  quadratic  variation  of  mean  anomaly  with  time.  Tne  drag  effect  on 
eccentricity  is  modeled  based  on  the  assumption  that  perigee  height  remains  constant 
The  following  analysis  includes  the  secular  effects  of  atmospheric  drag  and  gravitation: 

(1)  Modified  semi-major  axis  (a)  to  include  atmospheric  drag.  This  is 


expressed  by 


V 


a  =  a0 


(2.19) 


where  t-ta  is  the  time  since  epoch.  Equation  2.19  is  derived  from  equation  2.16  which 
gives 


2 


where  n  is  approximated  by  a  Taylor  series  expansion  to  the  first  order, 


n~n„ 


It  should  be  noted  that  the  term 


K 

2 


is  generated  on  a  separate  computer 


code  which  uses  a  differential  corrector.  The  main  program  DRIVER  reads  this  quantity 
as  part  of  the  NORAD  2-line  element  set 

(2)  Modified  eccentricity  ( e ).  This  is  expressed  by 


1  -~°-,fora>q0 
a 

IQ"*,  for  a<q0 


(2.20) 


14 


Since  the  perigee  radius  is  assumed  constant,  the  resulting  secular  correction  applied  to 
the  eccentricity  may  produce  negative  values  for  that  element  on  near-circular  orbits 
(Schumacher,  1991,  pp.  106-107).  A  negative  eccentricity  is  undesirable  operationally. 
Thus,  SGP  arbitrarily  resets  these  negative  eccentricities  to  a  small  positive  value 

(lo-*). 


(3)  Modified  semilatus  rectum.  This  is  expressed  by 

p  =  a(  1-e2). 


(2.21) 


(4)  Modified  "mean"  longitude  of  ascending  node  (q5<  ).  This  is  expressed 


Qj-=n"+T('~' •)• 

(5)  Modified  "mean"  argument  of  perigee  (co5  ).  This  is  expressed  by 

dco  /  ^ 

©5.  =0)o+  —  (t-tJ. 

(6)  Modified  mean  "mean"  longitude  (ls).  This  is  expressed  by 

t  t  f  da)  dO. ')/  \  h„  i  \2 


(2.22) 


(2.23) 


(2.24) 


Equation  2.24  holds  since 


d{t-t0)~ 

where,  Ms°  may  be  approximated  by 

Ms.  (2.25) 

Therefore 

Ls  =  Ms  +C0j  + 

yields  the  desired  results,  after  substituting  Equations  2.22,  2.23,  and  2.25,  and  using  the 
relation 


L0  -  M0  +  coo  + 
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c.  Long-period  Perturbations 

Long-period  perturbations  are  included  as  follows: 

(1)  Vertical  component  of  the  eccentricity  vector  with  respect  to  the  line  of 


nodes  vector  (ayNSL ).  This  is  expressed  by 

ayNSL  =  esinco^  - ^^-sin/0  =  eL  sin©  (2.26) 

l  J2p 

where  eL  and  ©  represent  the  resulting  eccentricity  and  argument  of  perigee  (see  Figure 

2.3).  Equation  2.26  is  derived  from  Brouwer  (1959)  for  long  period  perturbations. 

(2)  The  modified  mean  "mean"  longitude  (l).  This  is  expressed  by 


L  = 


i  1  I&l  a 

1  4  Jtp  *SL 


sin  i0 


3+ 5  co  s/0 
l  +  cos/0 


(2.27) 


where 

axNSL  ~  ei  C0S®  (2.28) 

represents  the  horizontal  component  of  the  eccentricity  vector  with  respect  to  the  line  of 
nodes  vector  (see  Figure  2.3).  Equations  2.27  and  2.28  are  derived  from  Brouwer 
(1959)  with  Lyddane  (1963)  modifications  for  low  eccentricities  or  inclinations. 

(3)  Kepler's  equation.  Kepler's  equation  is  solved  iteratively  for  the  sum  of 
the  eccentric  anomaly  and  the  resulting  argument  of  perigee  {E  +  ©)  where 

(£  +  ©)_+1  =(£  +  ©)_  +A(E  +  ©)_  (2.29) 


with 


A(£  +  ©),  = 


U  -  ayNSL  cos (E  +  ©),  +  a^si  sin (E  +  ©),  -  (E  +  ©), 
~^yNSL  sin(£  +  ©),  C°S(£  ■t'to),  1 

U=L-Q c 


(2.30) 


and 


(£  +  ©)[  =U. 

Equation  2.30  is  derived  from  the  relation  (Roy,  1988,  p.85) 
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» 
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Satellite's  Obit 


Figure  2.3  Eccentricity  Vector  and  Node  Vector 


since 


and 


ACE  +  oo), 


L-£ls  +  eL  sin  £;  -  (E  +  co); 
l-eL  cosE; 


eL  sinE,.  -a^  sin (E  +  co),  -ayNSL  cos(E  +  co), 

(2.3 1] 

eL  cosE;  =a^t  sin  (E  +  co),.  +0^  cos(E  +  co);. 

(4)  Modified  eccentricity.  The  square  of  the  eccentricity  can  be  expressed  by 


2 

eL 


(2.32 


This  equation  simply  comes  from  Equation  2.31. 

(5)  Modified  semilatus  rectum.  This  is  expressed  by 


» 


=  a(l-eL2). 


(6)  Radial  distance  from  Earth  to  satellite.  This  is  expressed  by 


(2.33) 


r  =  a(l-eL  cos E). 

(7)  Time  rate  of  change  of  the  radial  distance.  This  is  expressed  by 


(2.34) 


.  4a  . 

r-  kf  — eL  sinE. 

r 

This  equation  is  derived  by  differentiating  Equation  2.34  giving 


(2.35) 


dr  .  pdE 
—  -  ae,  sm  E  — 
at  L  dt 

where  (Danb>,  1962,  p.  131) 


dE  kp  1  _  ttp  a _  &£ 

dt  ^ i  6f  cciiE  \  cd  '  r >  a 

(3)  I'rcduct  of  radial  distance  and  time  rate  of  cVrge  of  die  speed  of  the 
satellite.,  v.  This  is  expressed  by 


rv~kr 


(236) 


This  equation  is  true  since  (Daaby.  1962,  p.130) 


*  vp.  r  Jp7  r 

(9)  Argument  of  perigee  (cu).  This  is  expressed  implicitly  by 


a  .  ,  „  \  e,  sin  E 

sinu-  sin(^  +  vOy  Qy'jsi  axNSL  < - r 

rL  1+v1+!.  J 


(2.37) 


(2.38) 


IS 


and 


cosm  = 


cus(£  +  to)  ~  Oxnsl  +  4 


eL  sin£ 


'jNSl 


l  +  yjl-eL2 


(2.39) 


where 

n  =  D  +  co. 

Equation  2.38  is  derived  algebraically  using  the  relations 

sin  n  =  sin  "U  cos  co  +  cos  i)  sin  to 
sintt  =  -'Jl-e2  sin  E 


cosu  = 


eL  -  cos  £ 


eL  cos  £  - 1 

combined  with  Equation  2.31.  Equation  2.39  is  derived  similarly.  Now,  u  can  be 
calculated  from 


u  =  tan 


d.  Short-period  Perturbations 


sinu  ^ 
Vcosm  J 


(2.40) 


The  short-period  perturbations  are  now  included  by 
^  2 

(1)  rk  =r  +— J2-^-sin2i'  cos2u 
4  Pi 


1  * 

(2)  uk=u — J2  — -  (7  cos2  i0  - 1)  sin  2 u 
8  Pi 


3  ,  oE 


(3)  Qk=Qs  +— J2-—cosi0sin2u 

4  Pi 

3  a 2 

(4)  I*  =  i0  +  -  sin  h  cos  K  cos  2m  . 

4  Pi 


(2.41) 

(2.42) 

(2.43) 

(2.44) 


These  short-period  perturbations  are  derived  from  Brouwer  (1959).  They  are 
the  result  of  taking  Brouwer's  equations  for  short-period  perturbations  and  dropping 


I 


•  •  1 
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terras  with  a  coefficient  of  k2e  or  smaller  (Hoots,  1975,  p  .  9). 
e.  Resulting  Position  and  Velocity  Vectors 


The  unit  orientation  vectors  are  calculated  by  (see  Figure  2.4) 


where 


U  =  Msinw*  +  Ncosut 
V  =  Mcosh*  -Nsinw* 


M  = 


Mx 

My 

M7 


=  -sin  Cl  t  cos/* 

=  cos  £2*  cos/*  • 
=  sin  /* 


and 


=  cos  £2* 

=  sin£2*  k 


1*2=  0 


The  position  and  velocity  vectors  are  then  given  by 


r  =  rtU 


r  =  rU+(rv)V. 


(2.45) 


(2.46) 


(2.47) 

(2.48) 


Figure  2.4  Unit  Orientation  Vectors 


•  •  • 


•  •  • 
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HL  SGP4  AND  SDP4  SATELLITE  MOTION  MODELS 


A.  OVERVIEW 

The  SGP4  and  the  SDP4  models  are  the  satellite  motion  models  currently  used  by 
the  AFSPACECOM.  Given  the  six  mean  orbital  elements,  an  epoch  reference  time,  and 
a  drag  factor,  SGP4  or  SDP4  can  predict  the  position  and  velocity  at  a  future  time.  If 
the  satellite  is  "near-earth"  (i.e.,  orbit  period  less  than  225  minutes),  SGP4  propagates  the 
ephemerides,  otherwise  the  "deep-space"  satellites  are  propagated  by  SDP4.  Like  SGP, 
SGP4  considers  perturbing  accelerations  due  to  the  oblateness  of  the  Earth,  asymmetry  of 
the  Earth's  mass  about  the  equatorial  plane,  and  atmospheric  drag.  The  differences 
between  SGP  and  SGP4  are  that  SGP4  includes  zonal  harmonics  through  J4  whereas 
SGP  only  includes  through  J3,  SGP4  includes  some  "deep-space"  terms  not  found  in 
SGP,  and  SGP4  includes  a  drag  force  in  its  equatiois  of  motion  where  SGP  performs  a 
simple  correction  technique  to  include  the  effects  of  atmospheric  drag.  The  SDP4  model 
includes  the  same  perturbing  effects  as  SGP4,  but  also  includes  effects  due  to  the 
gravitational  attraction  of  the  sun  and  moon,  and  some  sectoral  and  tesseral  harmonics 
derived  from  the  longitudinal  variations  of  the  Earth's  gravitational  potential. 

In  Chapter  n  classification  of  satellite  motion  models  was  discussed.  Like  SGP,  the 
SGP4  and  SDP4  models  solve  the  equations  of  motion  analytically,  and  the  variation  in 
the  satellite's  motion  is  in  terms  of  the  changes  in  the  osculating  elements  with  respect  to 
time,  thus  SGP4  and  SDP4  are  general  perturbation  models  that  use  variation  of 
elements. 


B.  THEORY 

1.  SGP4  Model 

Here  a  general  discussion  is  given  as  presented  by  Paul  Schumacher  (1991). 
SGP4  is  based  on  the  theory  of  Lane  and  Cranford  (1969)  which  uses  the  work  of 
Brouwer  (1959)  and  Brouwer  and  Hori  (1961).  Brouwer  and  Hori  generalized  the 
original  drag-free  Brouwer  model  by  including  a  drag  force  in  the  equations  of  motion. 
This  involved  modeling  the  atmosphere  as  a  spherically  symmetric,  and  non-rotating 
entity.  The  density  is  assumed  to  decrease  exponentially  with  altitude,  providing  two 
free  parameters  (reference  density  and  scale  height)  which  can  be  chosen  to  represent  the 
real  atmosphere  over  some  altitude  range.  This  atmospheric  model  is  combined  with  a 
gravitational  force  model  which  includes  zonal  harmonics  J2 ,  J3 ,  and  J4.  The  equations 
of  motion  are  written  in  Delaunay  variables  and  treated  by  von  Zeipel's  method. 
Unfortunately,  this  model  was  discovered  to  be  very  inaccurate  for  satellites  with  low 
perigee  altitudes,  exactly  where  the  satellite  is  nearing  decay. 

The  work  of  Lane  (1965)  and  Lane  and  Cranford  (1969)  attempted  to  overcome 
the  problems  of  inaccuracy  presented  by  the  Brouwer  and  Hori  model.  As  a  possible 
improvement  to  the  exponential  density  model.  Lane  assumed  that  the  scale  height 
parameter  varies  linearly  with  altitude.  This  assumption  led  to  a  new  description  of  the 
density,  namely,  an  "almost-power"  function  (Fitzpatrick,  1970).  This  function  involves 
three  parameters:  reference  density,  reference  scale  height,  and  scale  height  gradient. 
The  gravitational  model  includes  zonal  harmonics  J2,J3,JA,  and  J5.  Included  in  the 
solution  are  complete  first-order  short-periodic  and  secular  terms,  some  second-order 
secular  terms,  and  all  first-order  long-periodic  terms,  except  for  the  mean  anomaly  which 
only  includes  drag-related  terms.  Singularities  resulting  from  an  eccentricity  or 
inclination  equal  to  zero  are  avoided  by  Lyddane's  approach  (1963)  of  replacing 
Delaunay  variables  with  Poincare  variables.  In  1971  Cranford  established  these  results 


as  the  SGP4  theory,  and  by  1976  SGP4  had  replaced  SGP  as  the  operational  theory  at  the 
AFSPACECOM.  Like  SGP,  SGP4  uses  similar  compromises  in  its  implementation.  All 
J5  terms,  ail  0(1O-3)  short-period  terms  and  several  long-period  terms  are  not  included 
in  the  solution. 

The  satellite  drag  density  is  measured  by  the  modified  ballistic  coefficient,  B\ 

This  coefficient  is  a  product  of  satellite  drag  coefficient,  area-to-raass  ratio  and  a 

combination  of  density  profile  parameters.  The  modified  ballistic  coefficient,  B\  is 

obtained  from  a  differential  correction  process  and  is  one  of  the  input  parameters  to 

dn 

SGP4.  It  is  somewhat  analogous  to  the  —  parameter  used  for  SGP.  Due  to  the 

implementation  compromises  and  the  difficulty  in  acquiring  accurate  density  values  from 
a  simple  model,  SGP4  is  considered  inaccurate  for  rapidly  decaying  satellites. 

2.  SDP4  Model 

Here,  again,  a  general  discussion  is  given  as  presented  by  Paul  Schumacher 
(1991).  SDP4  includes  most  of  the  SGP4  implementation,  but  also  includes  some  deep- 
space  perturbations  developed  by  Hujsak  (1979).  As  mentioned  previously,  SDP4  only 
applies  to  satellites  with  period  greater  than  225  minutes.  This  model  includes  leading 
terms  for  lunar  and  solar  attraction,  as  well  as  several  longitudinal  dependent  Earth 
geopotential  harmonics.  Secular  lunar  and  solar  perturbations  are  obtained  by  averaging 
over  the  satellite  period,  and  the  remaining  short-period  terms  are  neglected. 
Singularities  are  found  in  the  secular  formulae  at  zero  values  of  inclination  and  are  dealt 
with  by  omitting  sensitive  terras  when  the  inclination  is  relatively  small.  A  second 
averaging  over  the  periods  of  the  moon  and  sun  respectively  provides  additional  secular 
terms  and  some  short-period  terms.  Earth  geopotential  resonance  corrections  are  used 
only  for  in-track  position.  Although  SDP4  has  its  shortcomings  in  its  calculations  of  the 
perturbatious,  it  is  the  only  "deep-space"  analytical  model  that  is  used  operationally. 
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IV.  PARALLEL  COMPUTING 


» 


This  chapter  closely  follows  the  parallel  computing  chapter  written  by  Warren  E. 
Phipps  in  his  thesis  on  the  parallelization  of  the  NAVSPASUR  model,  PPT2  (1992). 

A.  OVERVIEW 
1.  Definition 

As  scientific  models  become  increasingly  more  complex  and  detailed,  the 
computer  requirements  in  terms  of  amount  of  computation  and  speed  become  more 
demanding.  Computer  engineers  have  responded  by  taking  two  approaches  to  achieve 
faster  performance. 

The  first  approach  is  to  increase  the  speed  of  the  circuitry.  Although  great 
advances  have  been  made  in  this  area,  the  speed  is  bounded  by  the  speed  of  light  Also, 
the  design  and  manufacture  requirements  for  continued  increases  in  speed  are  very 
costly.  The  second  approach  is  the  use  of  parallel  computing.  Parallel  computing  or 
parallel  processing  provides  an  alternate  means  to  achieve  faster  computer  performance 
at  a  more  affordable  price.  In  this  new  field,  various  definitions  in  how  to  define  parallel 
computing  exist  One  definition  which  describes  parallel  computing  most  completely 
and  concisely  may  be  taken  from  (Hwang,  1984,  p.6): 

Parallel  processing  is  an  efficient  form  of  information  processing  which 
emphasizes  the  exploitation  of  concurrent  events  in  the  computing  process. 
Concurrency  implies  parallelism,  simultaneity,  and  pipelining.  Parallel  events  may 
occur  in  multiple  resources  during  the  same  time  interval;  simultaneous  events 
may  occur  at  the  same  time  instant;  and  pipelined  events  may  occur  in  overlapped 
time  spans.  These  concurrent  events  are  attainable  in  a  computer  system  at  various 
processing  levels. 
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In  his  book,  Hwang  describes  the  processing  levels.  The  top  level  is  referred  to  as 
the  program  level  which  involves  executing  multiple  programs  by  means  of 
multiprogramming,  time  sharing,  and  multiprocessing.  This  depth  of  parallel  processing 
is  beyond  the  scope  of  this  thesis.  The  lower  levels  (i.e.,  task  level,  inter-instruction 
level,  and  intra-instruction  level)  involve  the  execution  of  a  single  program  which  are 
applicable  to  the  programming  done  for  this  research.  Thus,  for  the  purpose  of  this 
thesis,  parallel  computing  is  defined  as  the  efficient  form  of  information  processing 
emphasizing  the  concurrent  computations  and  manipulation  of  data  to  solve  a  single 
problem. 

2.  Classification  of  Parallel  Computers 
a.  Type  Classifications 

Implicit  in  the  definition  of  parallel  computing  are  three  methods  to  achieve 
parallelism:  temporal  parallelism,  spatial  parallelism,  and  asynchronous  parallelism 
(Hwang,  1984,  p.  20).  These  methods  provide  a  means  to  classify  the  various  types  of 
parallel  computers. 

The  first  type  is  a  pipeline  computer.  Pipeline  computers  perform  overlapped 
computations ,  thus  use  temporal  parallelism.  Computations  are  broken  into  a  number  of 
segments  or  stages  where  the  output  of  one  segment  is  the  input  of  another  segment  If 
all  the  segments  work  at  the  same  speed,  the  work  rate  of  the  pipeline  is  equal  to  the  sum 
of  the  work  rates  of  the  segments,  once  the  pipe  is  full.  This  is  analogous  to  an  assembly 
line  in  a  factory.  An  example  of  a  pipeline  computer  is  a  Cray-1. 

The  second  type  is  a  processor  array.  A  processor  array  is  a  set  of  identical 
synchronized  processing  elements  to  achieve  spatial  parallelism.  It  is  capable  of 
simultaneously  performing  the  same  operation  on  different  data.  An  example  of  a 
processor  array  is  the  Connection  Machine. 
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The  third  type  is  a  multiprocessor.  The  terra  multiprocessor  refers  to  a  shared- 
memory  multiple-CPU  computer  designed  for  parallel  processing.  These  CPU’s  or 
processors  are  capable  of  performing  independent  operations  and  may  achieve 
asynchronous  parallelism  (i.e.,  the  processors  have  the  ability  to  work  with  the  most 
recently  available  data).  An  example  of  a  multiprocessor  is  the  Cm*  of  Camegie-Mellon 
University. 

The  final  type  is  a  derivative  of  the  multiprocessor,  the  multicomputer.  The 
multicomputer  is  a  multiple  CPU  computer  designed  for  parallel  processing  without  the 
shared  memory.  Multicomputers  can  also  achieve  asynchronous  parallelism  through 
communications  between  the  processors  or  nodes.  An  example  of  a  multicomputer  is  the 
INTEL  iPSC/2  hypercube.  Since  each  processor  has  its  own  memory  and  can  perform 
independent  operations,  multicomputers  offer  a  greater  degree  of  freedom  in 
programming.  However,  communication  between  the  nodes  may  require  that 
synchronization  be  programmed  into  the  multicomputer  code  to  achieve  the  objective  of 
the  code. 

The  four  type  classifications  are  not  necessarily  mutually  exclusive.  For 
example  many  commercially  available  processor  arrays,  multiprocessors,  and 
multicomputers  use  pipeline  processors  to  complete  operations  such  as  vector  processing. 

b.  Architectural  Classifications 

Parallel  computers  can  also  be  classified  according  to  their  architecture.  One 
such  classification  was  devised  by  Flynn  (1966).  He  categorized  digital  computers  by 
the  multiplicity  of  hardware  used  to  manipulate  instruction  and  data  streams.  Four  classes 
of  computers  resulted  from  this  : 

1.  Single-Instruction  stream,  Single  Data  stream  (SISD).  Most  serial  computers  fall 
into  this  category.  Although  instructions  are  completed  sequentially,  this  category 
includes  overlapping  instructions  (pipelining).  Therefore,  pure  pipeline  processors  also 
belong  to  this  category. 
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2.  Single  Instruction  stream,  Multiple  Data  stream  (SIMD).  Processor  arrays  fail  into 
this  category.  The  processor  array  receives  a  single  set  of  instructions,  but  each  element 
receives  and  manipulates  its  own  set  of  data. 

3.  Multiple  Instruction  stream,  Single  Data  stream  (MISD).  No  current  computers 
fall  into  this  category. 

4.  Multiple  Instruction  stream,  Multiple  Data  stream  (MIMD).  Most  multiprocessors 
and  multicomputers  fall  into  this  category.  The  INTEL  iPSC/2  is  a  MIMD  machine. 

c.  Topological  Classifications 

Another  classifying  scheme  for  parallel  computers  that  only  apply  to  processor 
arrays,  multiprocessors,  and  multicomputer  concerns  the  topology  of  the  inter-processor 
connections.  These  connections  are  the  means  through  which  processors  can 
communicate  with  one  another.  Seven  important  processor  organizations  are  the  mesh, 
the  pyramid,  the  fat  tree,  die  butterfl) ,  the  shuffle-exchange,  cube-connected  cycles  and 
the  hypercube.  Figure  4.1  show  examples  of  the  first,  six  processor  organizations 
mentioned  Figure  4.2  shows  examples  of  the  hypercube  topology.  This  thesis  will 
obviously  emphasize  the  hypercube  topology.  For  a  more  complete  discussion  on  the 
other  topologies  see  (Qtinn,  1987,  pp.  25- 3C,. 

3.  Measurements  of  Performance 

The  objective  of  parallel  computing  is  m  ore  vide  faster  computation,  thus  certain 
define  ’  parameters  are  needed  to  measure  1m  performance  of  the  parallel  computer 
versus  the  serial  computer.  Computation  speeds  depend  on  many  factors  such  as 
hardware  design,  technical  specifications  of  we  compute:  components,  and  the  design  of 
the  algorithm  to  execute  the  computations,  Two  common  measures  of  effectiveness  that 
account  for  bo^h  the  hardware,  and  the  algorithm  design  are  speedup  and  efficiency. 

Speedup,  Sp,  is  defined  as  Lie  ratio  between  the  time  needed  to  execute  the  most 

efficient  sequential  algorithm  to  perform  a  set  of  computations,  Ts,  and  the  time  to 
perform  these  same  computations  using  parallelism,  Tp, 
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Figure  4.1  Various  processor  organizations 


Figure  4.2  Hypercubes  of  dimension  zero  through  four 
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It  should  be  emphasized  that  even  though  the  serial  program  and  the  parallel  program 
execute  the  same  set  of  computations,  the  two  programs  do  not  necessarily  follow  the 
same  algorithm.  Parallel  programs  often  include  additional  instructions  or  operations  to 
accommodate  parallelism.  Also,  as  previously  mentioned,  the  serial  program  should  be 

the  most  efficient  obtainable,  so  as  not  to  be  misleading  in  the  calculation  of  speedup. 
Many  suggest  that  the  times,  Tp  and  Ts,  be  measured  using  a  specific  parallel  computer 

and  the  fastest  serial  computer.  But,  the  variation  in  the  technical  specifications  between 
the  two  computers  make  the  comparison  unclear,  thus  do  not  provide  an  effective  means 

of  assessing  the  effectiveness  of  parallel  computing.  Therefore,  for  the  purpose  of  this 
thesis,  Sp  is  defined  as  the  ratio  of  time,  Tv  taken  by  the  parallel  computer  to  execute  the 

most  efficient  serial  algorithm  and  the  time,  Tp,  taken  by  the  same  parallel  computer 
executing  the  parallel  algorithm  using  p  processors. 
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The  other  measure,  efficiency,  Ep,  is  defined  as  the  ratio  of  the  speedup  to  the 
number  of  processors, 


(4.3) 


Efficiency  accounts  for  the  relative  cost  in  terms  of  number  of  processors  required  in 
achieving  a  certain  speedup. 

Many  factors  can  limit  the  possible  speedup  and  efficiency  of  a  parallel  program. 
These  factors  include  the  number  of  sequential  operations  that  cannot  be  parallelized,  the 
communication  time  among  processors,  and  the  time  each  processor  is  idle  due  to 
synchronization  requirements.  Many  have  argued  that  these  factors  severely  limit  the 
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benefits  of  parallel  computing.  Regardless  of  these  factors,  research  demonstrates 
parallel  computing  as  an  effective  means  to  reduce  computation  time  (Quinn,  1987,  pp. 
18-20).  If  one  considers  only  the  number  of  sequential  operations  in  a  program  that 
cannot  be  parallelized,  Amdahl's  Law  indicates  that  the  maximum  speedup  achievable  by 
p  processors  is. 


5  < _ : _ 

P  /  +  (l-/)/p* 


(4.4) 


where  /  is  the  fraction  of  operations  that  must  be  performed  sequentially  (Amdahl, 
1967,  pp.  483-485).  Equation  4.4  provides  an  initial  means  to  see  if  a  given  algorithm  is 
a  good  candidate  for  parallelization. 


B.  INTEL  i  PS  C/2  HYPERCUBE 

The  design  of  a  parallel  algorithm  is  done  with  a  specific  parallel  computer  in  mind. 
This  design  involves  maximizing  the  speedup  and  efficiency.  In  determining  the  parallel 
computing  potential  of  the  AFSPACECOM  satellite  motion  models,  SGP  and  SGP4,  an 
INTEL  iPSC/2  hypercube  computer,  located  at  the  Department  of  Mathematics  at  the 
Naval  Postgraduate  School,  was  used.  This  computer  is  a  MIMD  multicomputer  with  a 
hypercube  topology.  It  consists  of  a  system  resource  manager  called  the  host,  and  eight 
individual  processors,  referred  to  as  nodes.  The  host  is  a  386-based  computer,  which  can 
process  data  as  well  as  providing  an  interface  for  the  user. 

The  nodes  are  complete  and  independent  INTEL  80386  microprocessors.  Each 
node  also  contains  a  80387  numeric  coprocessor,  its  own  local  memory,  and  a  Direct- 
Connect  Communications  Module  (DCM).  Each  node  may  also  contain  a  Vector 
Extension  (VX)  module  for  pipelined  vector  operations.  The  iPSC/2  located  at  the  Naval 
Postgraduate  School  has  only  one  node  that  contains  the  VX  module. 
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Communications  among  the  nodes  and  host  are  done  with  message  passing.  The 
DCM  permits  messages  to  be  sent  from  an  individual  node  to  a  receiving  node  directly 
without  disturbing  the  remaining  nodes.  Each  node  along  the  message  path  can  be 
thought  of  as  a  switch  which  simply  closes  when  a  message  is  sent  Other  hypercube 
designs  require  messages  to  be  stored  and  forwarded  with  each  node  along  a  message 
path  until  the  message  is  received  by  the  destined  node. 

The  iPSC/2  uses  a  UNIX  operating  system  and  the  available  programming  languages 
are  Fortran  or  C.  For  a  more  detailed  listing  of  the  technical  specifications  of  the 
INTEL  iPSC/2  hypercubes  see  Appendix  A. 

C.  METHODS  OF  PARALLELIZATION 
1.  Vectorization 

Vectorization  is  the  process  of  converting  blocks  of  sequential  operations  into 
vector  instructions  that  may  be  pipelined.  A  simple  example  of  vectorization  using 
Fortran  is  the  following: 

Sequential  Code: 


Do  10  i  =  1,2V 
10  z(i)  =  x(i)  +  y(i) 


Vector  Code  (VAST2): 

call  vadd(N,  x,  l,y,  l,z,  1) 

VAST2  is  an  example  of  a  vectorization  compiler  to  assist  in  the  vectorization  of  a  serial 
program.  It  is  used  by  the  iPSC/2  at  the  Naval  Postgraduate  School.  There  are  many 
other  vectorization  compilers  that  are  also  available  commercially.  (Quinn,  1987,  pp. 
233-235) 

Vectorizing  compilers  automatically  vectorize  serial  codes  before  execution. 
Some  vectorizing  compilers  may  even  indicate  to  the  user  areas  of  the  program  that  limit 
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potential  vectorization.  Unfortunately,  vectorizing  compilers  are  not  solely  capable  of 
maximizing  vectorization.  Most  vectorizing  compilers  are  restricted  in  their  ability  to 
identify  sequential  blocks  that  can  be  vectorized  and  the  resulting  vectorization  may  not 
always  be  straight  forward.  The  VAST2  compiler,  for  example,  supports  only  Fortran 
programs  and  is  only  able  to  vectorize  do  loops  and  if  statements.  (iPSC/2  VAST2 
User's  Guide,  1939) 

2.  Distributing  Computations 

Vectorization  illustrates  the  first  level  of  parallelism,  in  that  tasks  are  parallelized 
on  individual  processors.  On  the  other  hand,  ro  distribute  partitions  or  a  program  to 
different  processors  on  a  multicomputer  another  approach  is  needed.  Compilers  created 
to  achieve  this  higher  level  of  parallelism  have  not  been  successfully  implemented  as 
with  the  vectorization  compilers.  Thus,  the  task  of  efficiently  parallelizing  an  algorithm 
is  left  to  the  user. 

Since  the  performance  of  parallel  algorithms  depend  on  factors  such  as  processor 
speed,  memory  access  time,  and  memory  capacity,  in  other  words,  the  techmcal 
specifications  of  the  parallel  computer,  the  process  of  parallelizing  an  algorithm  must  be 
done  with  a  particular  parallel  computer  in  mind.  The  multicomputer  where  each  node 
has  its  own  memory  presents  the  greatest  flexibility  to  the  user.  Here,  the  user  has  to 
partition  che  problem  among  the  processor  nodes.  The  hyperrube  topology  allows  the 
user  to  use  the  natural  topology  of  a  given  problem  to  divide  the  problem  into  parallel 
processes.  A  process  is  defined  as  a  single  statement  oi  a  group  of  statements  which  are 
a  self-contained  portion  of  the  total  computations.  For  the  INTEL  iPSC/2,  two 
decomposition  strategies  are  suggested:  Control  Decomposition  and  Domain 
Decomposition.  (iPSC/2  User's  Guide,  1990,  pp.  4-1  -  4-6) 


a.  Control  Decomposition 

Control  Decomposition  is  the  method  of  dividing  tasks  or  processes  among  the 
individual  processors  or  nodes,  a  divide  and  conquer  approach  to  the  problem.  This 
method  is  recommended  for  problems  with  irregular  data  structures  or  unpredictable 
control  flows. 

One  way  control  decomposition  is  implemented  is  by  using  a  self-schedule 
approach.  One  node  acts  as  the  manager  while  the  remaining  nodes  act  as  workers.  The 
managing  node  maintains  a  list  of  processes  to  be  accomplished  by  the  working  nodes 
and  distributes  these  processes  accordingly.  The  working  nodes  request  jobs,  receive 
processes,  and  perform  the  given  task.  Of  course,  in  the  self-scheduling  method  the  cost 
is  one  processor,  namely,  the  managing  processor.  (iPSC/2  User's  Guide,  1990,  p.  4-4) 

A  second  method  of  control  decomposition  involves  the  pre-scheduling  of  the 
processes.  The  exact  tasks  of  each  processor  are  explicitly  stated  in  the  parallel  program. 
Although,  this  method  saves  the  cost  of  one  node,  a  great  deal  of  care  must  be  taken  to 
ensure  the  processes  are  distributed  as  evenly  as  possible  among  the  nodes. 

b.  Domain  Decomposition 

Domain  Decomposition  involves  dividing  the  input  data  or  domain  among  the 
nodes.  These  partitioned  sets  may  be  specific  data  sets  such  as  blocks  of  a  matrix  or  may 
represent  a  specific  grid  such  ?s  used  in  finite  difference  or  finite  element  methods  to 
solve  partial  differential  equations.  Unlike  control  decomposition,  domain 
decomposition  requires  that  each  node  perform  essentially  the  same  tasks  but  with 
different  input  data. 

Domain  decomposition  is  suggested  if  a  program  contains  calculations  based 
on  a  large  data  structure  and  if  the  amount  of  work  is  the  same  for  each  node.  For 
example,  the  multiplication  of  two  large  matrices  by  block  multiplication  uses  domain 
decomposition.  Although  domain  decomposition  may  appear  to  be  perfectly 


paraUelizable,  the  user  must  use  caution  to  ensure  each  input  set  requires  about  the  same 
amount  of  work. 

3.  Improving  Performance 

The  parallelization  of  a  problem  may  require  the  use  of  control  decomposition, 
domain  decomposition,  or  a  combination  of  both  methods  to  be  an  efficient  algorithm. 
Once  a  particular  method  is  chosen,  several  factors  need  be  considered  to  improve  the 
performance  of  the  parallel  algorithm.  These  factors  include  bad  balance, 
communication  to  computation  ratio,  and  sequential  bottlenecks. 

a.  Load  Balance 

Load  balance  refers  to  the  degree  to  which  all  nodes  are  working  or  active.  In 
the  case  where  the  work  is  not  evenly  distributed  among  the  nodes,  the  parallel  algorithm 
will  be  limited  in  its  speedup  ability.  Ways  to  achieve  load  balancing  is  through  a 
decrease  in  grain  size  of  the  parallel  tasks,  self-scheduling  tasks,  or  redistributing  the 
domain.  Grain  size  refers  to  the  relative  amount  of  work  completed  in  parallel.  For 
example,  distributing  computations  may  be  considered  as  large  grain  parallel  computing 
while  pipelined  vector  operations  are  small  grain  parallel  computing. 

b.  Communication  to  Computation  Ratio 

Communication  to  computation  ratio  is  the  ratio  of  the  time  spent 
communicating  to  the  time  spent  computing.  The  time  loss  for  communication.;  is 
unavoidable  in  parallel  algorithms,  except  for  perfectly  parallel  problems.  A  large 
communication  to  computation  ratio  restricts  the  performance  of  a  parallel  program.  The 
objective  is  to  maximize  the  time  a  node  spends  computing,  thus  minimizing  the  time  it 
is  communicating.  Reducing  the  communication  to  computation  ratio  may  be 
accomplished  by  increasing  the  grain  size,  grouping  messages,  or  recalculating  values 
yice  receiving  the  value  from  another  node. 
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c.  Sequential  Bottlenecks 


» 


In  some  cases  a  task  cannot  begin  unless  a  previous  task  is  completed,  thus 
limiting  the  number  of  tasks  that  can  be  done  in  parallel.  A  sequential  bottleneck  is  the 
situation  where  processors  are  waiting  for  another  processor  to  complete  a  task  before 
they  may  continue.  The  fraction  of  operations  of  a  algorithm  that  cannot  be  done  in 
parallel  can  greatly  limit  speedup  as  seen  by  Amdahl's  Law  (Equation  4.4).  Sequential 
bottlenecks  are  potentially  found  with  any  requirements  of  the  nodes  to  synchronize.  The 
only  way  to  remove  sequential  bottlenecks  is  to  modify  or  restructure  the  algorithm  in 
order  to  overlap  sequential  cede  with  other  computations. 
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V.  PARALLELIZATION  OF  SGP,  SGP4,  AND  SDP4 


The  objective  of  this  thesis  is  to  assess  the  parallel  computing  potential  of  the 
AFSPACECOM's  satellite  motion  models  SGP,  SGP4,  and  SDP4.  This  analysis  may  be 
done  by  comparing  the  speedups  and  efficiencies  of  different  parallel  algorithms 
designed  using  the  various  methods  and  strategies  discussed  in  Chapter  IV.  Warren 
Phipps  (1992)  concluded  in  his  thesis  that  vectorization  and  control  decomposition  were 
not  beneficial  in  reducing  the  computation  time  of  the  NAVSPASUR  model  (PPT2), 
whereas  the  domain  decomposition  strategy  showed  promise.  Since  like  the  PPT2 
model,  the  SGP,  SGP4,  and  SDP4  models  are  analytic  and  have  computation  times  of  the 
same  order  of  magnitude"  it  was  decided  that  this  thesis  would  only  investigate  the 
domain  decomposition  strategy  for  all  three  models. 

The  objective  of  the  domain  decomposition  method  is  to  the  reduce  the  SGP,  SGP4, 
and  SDP4  models'  computation  time  by  the  concurrent  computation  of  several  satellite 
data  sets.  Each  node  of  the  hypercube  performs  identical  tasks  on  different  satellite  data 
sets,  at  the  same  time.  Thus,  as  was  considered  with  PPT2,  the  ultimate  goal  is  to  reduce 
the  overall  computation  time  for  several  objects  in  orbit 

A.  ALGORITHM 

The  parallel  algorithm  design  used  for  SGP,  SGP4,  and  SDP4  is  identical  to  the 
parallel  design  used  by  Phipps  (1992)  to  achieve  the  domain  decomposition  of  PPT2.  A 
single  node  is  devoted  to  reading  all  the  input  data  and  distributing  this  data  to  the 
working  nodes  (i.e.,  those  processors  that  actually  propagate  the  given  input  data  by 
either  calling  the  SGP,  SGP4,  or  SDP4  subroutine  to  produce  a  position  and  velocity 
vector  for  each  time  required  beyond  epoch),  and  a  single  node  is  to  collect  the  results 
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and  write  them  to  an  output  file.  Figure  5.1  illustrates  how  the  satellite  data  is  distributed 
for  a  three  dimensional  hypercube. 

In  his  thesis,  Phipps  presented  the  advantages  and  disadvantages  of  using  this 
particular  parallel  algorithm  design: 
advantages: 

*  Relatively  simple  to  apply.  No  requirement  for  communication  or  synchronization 
among  the  working  nodes. 

*  The  existing  subroutine  (i.e.,  PPT2  for  Phipp’s  thesis,  and  SGP,  SGP4,  or  SDP4  for 
this  thesis)  may  be  used  with  only  minor  modifications.  The  same  input  parameters 
can  be  used  for  the  parallel  program  as  for  the  serial  program. 

*  The  strategy  to  achieve  maximum  efficiency  is  reduced  to  developing  an  algorithm 
to  distribute  the  data  in  a  timely  manner.  Thus,  decreasing  the  wait  time  for  any  one 
node. 

disadvantages: 

*  Potential  for  sequential  bottlenecks  at  input/output  portions  of  algorithm.  Reading 
and  writing  to  an  external  file  is  extremely  time  consuming.  With  the  iPSC/2 
hypercube  used  for  this  thesis  and  Phipp's  thesis  the  input/output  is  completed 
sequentially. 

*  The  cost  is  the  loss  of  two  nodes,  the  input  node  and  the  output  node. 

A  complete  listing  of  the  source  codes,  and  input/output  parameters  for  the  parallelized 
versions  of  SGP,  SGP4,  and  SDP4  are  contained  in  Appendix  B. 

The  serial  algorithms  used  for  this  research  for  SGP,  SGP4  and  SDP4  are  minor 
modifications  of  the  original  algorithms  obtained  from  the  AFSPACECOM.  The 
modified  serial  algorithms  read  the  entire  batch  of  satellite  data  before  propagating  any 
of  the  input  For  example,  suppose  1000  satellites  need  to  be  propagated,  the  modified 
serial  algorithm  would  read  the  input  data  for  all  1000  satellites  and  proceed  to  propagate 
the  position  and  velocity  vectors  one  right  after  the  other  for  each  satellite.  On  the  other 


Figure  5.1  Distribution  of  Satellite  Data 


hand,  the  original  serial  algorithms  read  in  one  satellite,  propagate  the  data,  read  in  the 
next  satellite,  and  propagate  its  data,  etc.  This  modification  was  performed  to  create 
more  of  a  similarity  between  the  serial  algorithms  and  the  parallel  algorithms.  It  should 
be  noted  that  the  execution  times  of  the  modified  serial  algorithms  are  equivalent  to  the 
original  serial  algorithms. 

The  execution  times  for  the  serial  and  parallel  programs  for  each  of  the  models 
(SGP,  SGP4,  and  SDP4)  were  obtained  by  using  the  internal  clocks  contained  in  each 
processor.  For  the  parallel  programs,  the  processor  that  depicted  the  largest  amount  of 
time  for  execution  was  used.  Each  execution  time  is  the  result  of  an  average  of  ten 
recorded  execution  times. 


1.  Assessment  of  SGP 


a.  Results 

The  experimental  results  in  the  analysis  of  the  parallelization  of  SGP  depict 
similar  trends  and  conclusions  as  obtained  by  Phipps  (1992)  in  his  analysis  of  PPT2. 
Figure  5.2  is  a  plot  of  the  mean  execution  time  for  the  serial  version  and  the  parallelized 
versions  of  SGP,  using  a  four-node  and  an  eight-node  hypercube,  versus  the  number  of 
satellites  propagated.  See  Appendix  C  for  the  execution  times.  As  seen  by  Figure  5.2, 
the  parallelized  version  of  SGP  was  successful  in  reducing  overall  computation  time. 
For  convenience  the  serial  version  of  SGP  will  be  referred  to  as  SSGP  and  the  parallel 
version  as  PSGP.  This  notation  will  also  be  used  in  the  assessment  of  SGP4  and  SDP4. 
Table  5.1  gives  the  efficiencies  and  speedups  for  different  numbers  of  satellites  using 
four  nodes  and  eight  nodes.  This  table  depicts  a  significant  increase  in  efficiency  using 
eight  versus  four  nodes.  This  increase  in  efficiency  presents  the  possibility  of  achieving 
even  higher  efficiencies  with  the  use  of  PSGP  for  higher  dimensional  hypercubes.  Table 
5.1  also  indicates  an  approximate  three  to  one  ratio  in  the  speedups  obtained  using  eight 
nodes  versus  four  nodes.  This  is  expected  since  the  eight  node  hypercube  contains  six 
"working"  nodes,  which  is  three  times  larger  than  the  two  "working"  nodes  used  in  the 
four  node  hypercube. 

Another  trend  observed  in  Table  5.1,  more  notably  for  the  case  with  four 
nodes,  is  the  slight  increase  in  performance  to  a  peak  value  and  then  slightly  decreasing 
as  the  number  of  satellites  increases.  For  this  parallel  algorithm  the  computation  to 
communication  ratio  does  not  vary  with  the  number  of  satellites.  In  agreement  with 
Phipps  (1992),  the  increase  in  performance  must  mainly  be  due  to  the  lessening  impact  of 
the  algorithm's  overhead  on  total  execution  time.  This  overhead  includes  some  small 
computations  performed  by  each  working  node  to  determine  the  number  of  satellites  it 
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Figure  5.2  SGP  Execution  Times 


TABLE  5.1  PSGP  PERFORMANCE 


Number  of 

Eieht 

Nodes 

Four 

'lodes 

Satellites 

s. 

B, 

EP 

6 

2.76 

.345 

1.44 

.360 

12 

3.46 

.433 

1.47 

.368 

36 

4.11 

.514 

1.52 

.380 

144 

4.41 

.552 

1.51 

.376 

216 

4.49 

.561 

1.51 

.377 

500 

4.49 

.561 

1.49 

.374 

1296 

4.49 

.561 

1.41 

.352 

5000 

4.48 

.561 

1.49 

.373 

10000 

4.47 

.559 

1.49 

.372 
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must  receive  from  the  distributing  node.  Since  these  computations  are  only  completed 
once  in  the  program,  the  time  cost  with  these  calculations  become  negligible  as  the 
number  of  satellites  propagated  increases.  One  possible  phenomenon  to  offset  this 
increase  may  have  to  do  with  the  amount  of  message  sets  contained  in  the  local  buffer  of 
a  working  node  in  terms  of  the  time  cost  to  manage  these  sets  (i.eM  each  message  set 
corresponds  to  a  satellite),  which  may  increase  more  than  linearly  with  the  number  of 
satellites.  If  a  node  is  not  ready  to  receive  a  message  it  stores  it  in  a  local  buffer.  As 
will  be  explained  in  the  next  section,  the  use  of  four  and  eight  nodes  result  in  the  working 
nodes  having  message  sets  received  from  the  distributing  node  stored  in  their  local 
buffers.  For  a  large  number  of  satellites,  several  messages  will  be  stored  in  each  local 
buffer  soon  after  the  program  is  initiated.  Although  the  time  for  a  node  to  read  from  its 
local  buffer  is  insignificant,  the  time  involved  in  managing  several  storage  sets  may  not 
be.  The  efficiency,  in  the  case  of  eight  nodes,  peaks  and  levels  out  at  .561  for  216 
satellites  and  decreases  only  slightly  to  .559  at  10000  satellites.  The  efficiency  for  four 
nodes  peaks  at  .380  for  36  satellites  and  decreases  to  .372  at  10000  satellites, 
b.  Improvements 

As  stated  in  the  previous  section,  PSGP  raay  yield  increased  efficiencies  and 
speedups  if  applied  to  a  hypercube  of  greater  dimension  than  three  available  on  the 
iPSC/2  hypercube  used  for  this  thesis.  Since  the  number  of  working  nodes  is  not  fixed 
for  this  algorithm,  PSGP  can  be  applied  to  any  dimension  hypercube  without 
modifications.  An  increase  in  efficiency  should  occur  with  an  increase  in  the  hypercube 
dimension  until  the  time  to  distribute  a  separate  satellite  data  set  to  each  working  node 
exceeds  the  time  required  by  the  node  to  propagate  a  single  satellite.  Hence,  the 
performance  of  the  algorithm  could  improve  by  applying  it  to  an  optimal  dimension 
hypercube. 
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Since  the  hypercube  at  the  Naval  Postgraduate  School  is  limited  to  eight 
processors,  a  reasonably  accurate  model  developed  by  Phipps  for  his  analysis  of  PPT2 
was  used  to  estimate  the  optimal  hypercube  dimension.  The  total  execution  time  for 
PSGP  to  propagate  n  satellites  with  p  processors,  t(p),  can  be  modeled  by 

t(p)  =  twi(p)  +  tW2(p)  +  tc(p)  (5.1) 

where  twl(p)  is  the  time  the  last  node  must  wait  to  receive  its  first  satellite  data  set, 
tw j  (p)  is  the  total  time  the  last  node  must  wait  to  receive  all  of  its  subsequent  satellite 
data  sets,  and  tc  ( p )  is  the  time  for  each  node  to  propagate  its  share  of  the  n  satellites. 

As  discussed  in  the  previous  chapter,  the  iPSC/2  uses  a  Direct-Connect 
Module  (DCM).  Because  of  the  DCM  the  startup  time  for  a  message  to  be  passed 
between  two  nodes  is  essentially  constant  regardless  of  the  length  of  the  path  the  message 
is  sent  through.  Thus,  the ’time  to  send  a  message  between  two  nodes  is  only  a  function 
of  the  size  or  number  of  bytes  contained  in  the  message.  Since  the  size  of  all  the 
messages  between  the  distributing  node  and  the  working  nodes  are  of  the  same  size  (104 
bytes),  the  time  to  send  a  single  message  between  the  distributing  node  and  each  working 
node  is  essentially  constant  In  this  algorithm  there  are  p-2  working  nodes.  Letting 
tu{  1)  represent  the  time  to  send  a  single  message  between  the  distributing  node  and  a 
working  node,  twl  (p)  may  be  modeled  by 

twl  (p)  =  [(p -  2)  -  l]tu  (1)  =  (p  -  3)tu  (1).  (5.2) 

To  determine  tu  (1),  a  small  program  was  created  and  executed  on  the  iPSC/2  hypercube 


at  the  Naval  Postgraduate  School.  This  program  timed  the  message  passing  between  two 

nodes  for  various  size  messages.  For  each  case  the  time  to  send  a  single  byte  was 

rtisec 

calculated,  and  these  values  were  averaged  to  equal  3.60x10  - .  This  value  is 

byte 

Mbytes 

equivalent  to  2.78 -  which  when  rounded  matches  the  hardware  speed  of 

sec 


2.8  — —  —  quoted  in  the  iPSC/2  technical  summary  (1988,  p.ll).  Thus,  the  mean 
sec 

value  of  r*(l)  is  approximately 

(3.60  x  10^ )(104)  =.0374  msec. 

The  total  wait  time  for  a  working  node  to  receive  subsequent  satellite  data  sets, 
tw2(p),  is  a  function  of  the  elapsed  time  for  the  working  node  to  propagate  a  single 

satellite  data  set,  the  elapsed  time  for  the  distributing  node  to  send  a  subsequent  satellite 
data  set  to  the  working  node,  and  the  number  of  satellites  the  working  node  must 
propagate.  Since  the  distributing  node  distributes  the  data  while  the  working  nodes  are 
computing,  the  wait  time  is  zero  if  the  subsequent  satellite  data  arrives  before  the 
working  node  is  ready  to  receive  it  On  the  other  hand,  if  the  subsequent  satellite  data 
arrives  after  the  node  finished  with  the  previous  satellite  data  set,  the  wait  time  is  the 
difference  between  the  elapsed  time  for  the  distributing  node  to  send  the  node  another 
satellite  data  set  and  the  computing  time  for  the  previous  satellite.  Since  the  distributing 
node  must  send  a  satellite  data  set  to  each  of  the  other  nodes  before  sending  a  subsequent 
data  set  to  the  last  node,  the  elapsed  time  for  the  distributing  node  to  send  another  data 
set  can  also  be  modeled  by  twl  (p)  in  Equation  5.2.  Thus,  the  total  wait  time  is  the  wait 

time  for  each  subsequent  satellite  data  set  multiplied  by  the  number  of  satellite  data  sets 
received  by  each  working  node.  Letting  the  time  to  propagate  a  single  satellite  be 
represented  by  fl,  tw2  (p)  may  be  modeled  by 


0 


lU>-2  ) 


if 

if  twX  ip)  2  fl. 


(5.3) 


The  time,  fl,  was  measured  to  be  4.60  milliseconds  using  SSGP. 

Assuming  the  time  for  one  node  to  propagate  n  satellites,  r(l),  is 


the  total  computation  time  for  each  working  node,  tc  ip),  may  be  approximated  by 


*Ap)  = 


njt  1) 
P-2' 


(5.4) 


Substituting  Equation  5.1  into  Equations  4.2  and  4.3,  t^e  speedup  and 
efficiency  using  p  processors  can  be  modeled  by 


s  = 

"  Up)  twl(p)+tw2(p)+tc(p) 

E  _£p  = _ njsl) _ 

P  P  p[Kx  ip)+K2iP)+tcip)] 


(5.5) 


The  total  execution  time,  speedup,  and  efficiency  (r(p),  Sp,  and  Ep)  were  calculated 

using  Equations  5.2  -  5.5  along  with  the  following  substitutions: 

*  tu  (1)  =  .0374  msecs 

*  rl  =  4.60  msecs 

*  n  =  5000  satellites 

The  number  of  satellites  was  chosen  to  be  5000  since  this  seemed  to  be  a  reasonable 

number  for  SGP's  current  use  at  various  sensor  sites.  Figures  5.3  and  5.4  indicate  the 
estimates  of  t(p),  Sp,  and  Ep  using  4  to  1024  processors  (a  cube  dimension  of  2  to  10). 

Using  this  model,  PSGP  is  capable  of  achieving  a  maximum  efficiency  above  .9  using  a 
7  dimensional  hypercube  (128  nodes).  These  plots  are  only  estimates  of  the  true  values 
of  speedup  and  efficiency.  Their  prediction  for  the  speedup  and  efficiency  using  4  and  8 
nodes  are  slightly  higher  than  the  obtained  experimental  values,  but  most  certainly 
provide  a  good  indication  of  the  parallel  computing  potential  of  PSGP  for  higher 
dimensional  hypercubes.  Notice  for  4  and  8  nodes,  tw2  (p)  equals  zero  resulting  in  stored 
messages  in  the  local  buffers  of  the  working  nodes.  Since  tvl  ip)  is  much  much  less  than 
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Milliseconds  Milliseconds 


1 1,  several  messages  will  be  stored  in  the  buffers,  when  propagating  a  large  number  of 
satellites,  shortly  after  the  program  is  initiated. 

2.  Assessment  of  SGP4 

a.  Results 

Like  SGP,  the  parallelization  of  SGP4  also  produced  similar  trends  and 
conclusions  as  the  parallelization  of  PPT2.  Figure  5.5  is  a  plot  of  the  mean  execution 
time  for  the  serial  and  the  parallelized  versions  of  SGP4,  using  a  four-node  and  an  eight- 
node  hypercube,  versus  the  number  of  satellites  propagated.  See  Appendix  C  for  the 
execution  times.  Table  5.2  gives  the  efficiencies  and  speedups  for  a  various  number  of 
satellites  using  four  and  eight  nodes.  Again,  an  approximate  three  to  one  ratio  exists 
between  the  values  of  speedup  using  eight  nodes  versus  four  nodes.  Table  5.2  depicts  the 
same  trend  as  observed  with  PSGP.  In  the  case  of  four  nodes,  the  peak  value  occurs  at 
36  satellites  with  an  efficiency  of  .400.  For  eight  nodes  this  occurs  at  500  satellites  with 
an  efficiency  of  .595.  Notice  again  an  increase  in  efficiency  as  the  number  of  nodes 
increased  from  four  to  eight,  motivating  the  investigation  of  using  hypercubes  of  higher 
dimension. 

b.  Improvements 

The  model  used  in  the  previous  section  to  estimate  an  optimal  dimension 

hypercube  is  also  used  here,  since  PSGP  uses  the  domain  decomposition  parallel 
algorithm.  The  total  execution  time,  speedup,  and  efficiency  ( t(p),Sp ,  and  Ep)  were 

calculated  using  Equations  j.2-5.5  with  the  following  substitutions: 

*  lu  (1)  =  -0634  msec  (the  message  size  between  the  distributing  node  and  the 
working  nodes  is  176  bytes). 

*  f  1  =  6.60  msecs 


*  n  =  5950  satellites 


UHHKMMitt 
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TABLE  5.2  PSGP4  PEI 

EIFORMANCE 

Number  of 

Eight 

Nodes 

Four  1 

Modes 

Satellites 

S, 

EP 

6 

3.31 

.414 

1.55 

.388 

12 

3.85 

.481 

1.58 

.394 

36 

4.45 

.556 

1.60 

.400 

144 

4.71 

.588 

1.60 

.399 

216 

4.74 

.592 

1.59 

.398 

500 

4.76 

.595 

1.57 

.393 

1296 

4.73 

.592 

1.51 

.377 

5000 

4.54 

.568 

1.48 

.369 

10000 

4.44 

.555 

1.47 

.367 

Figure  5.5  SGP4  Execution  Times 
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The  number  of  satellites  was  chosen  based  on  the  information  received  from  the 
AFSPACECOM.  The  AFSPACECOM  in  Colorado  Springs  propagates  data  on 

approximately  7000  satellites  a  day,  and  about  85  percent  of  those  are  low  earth  orbit, 
yielding  the  5950  value.  Figures  5.6  and  5.7  depict  the  estimates  of  t(p),  SP ,  and  Ep 

using  4  to  1024  processors.  Using  this  model,  PSGP4  is  capable  of  achieving  a 
maximum  efficiency  greater  than  .9  using  a  6  dimensional  hypercube  (64  nodes).  As 
with  PSGP,  the  estimates  using  this  model  for  PSGP4  predict  a  slightly  better 
performance  than  actually  achieved  for  a  hypercube  of  2  and  3  dimensions. 

3.  Assessment  of  SDP4 

a.  Results 

The  paralk -nation  of  SDP4  was  also  successful,  yielding  similar  results  as  the 
previous  parallel  algorithms.  Figure  5.8  plots  the  mean  execution  time  for  the  serial  and 
the  parallelized  versions  of  SDP4,  using  a  four-node  and  an  eight-node  hypercube,  versus 
the  number  of  satellites  propagated.  See  Appendix  C  for  the  execution  times.  Table  5.3 
gives  the  efficiencies  and  speedups  for  different  number  of  satellites  using  four  nodes  and 
eight  nodes.  As  expected  the  speedups  achieved  using  eight  nodes  is  approximately  three 
times  greater  than  that  for  four  nodes.  Again,  there  are  higher  efficiencies  obtained  using 
eight  nodes  versus  four  nodes  for  any  given  number  of  satellites.  As  expected,  peak 
efficiencies  exist  as  a  function  of  the  number  of  satellites  for  both  the  four  node  and 
eight  node  case.  In  using  eight  nodes  a  peak  efficiency  of  .644  is  achieved  when 
propagating  500  satellites.  In  using  four  nodes  a  peak  efficiency  of  .433  occurs  at  144 
satellites. 

b.  Improvements 

Using  the  same  model,  t(p),  Sp,  and  Ep  were  calculated  using  Equations  5.2  - 
5.5  with  the  following  substitutions: 
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n  =  5950,  tu  (1)  =.0634  msec,  1 1  =  6.6  msec 
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Figure  5.6  Estimated  Execution  Time  of  PSGP4  for  Various  Hypercube  Sizes 
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Figure  5.7  Estimated  Speedup  and  Efficiency  of  PSGP4  for  Various  Hypercube  Sizes 
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Figure  5.8  SDP4  Execution  Times 


TABLE  5.3  PSDP4PE 

RPQRMANCE 

Number  of 

Satellites 

Eight 

Nodes 

Four! 

Nodes 

s. 

E, 

s. 

6 

3.92 

.490 

1.70 

.425 

12 

4.48 

.560 

1.71 

.428 

36 

4.93 

.616 

1.73 

.432 

144 

5.13 

.641 

1.73 

.433 

216 

5.15 

.644 

1.73 

.432 

500 

5.15 

.644 

1.71 

.428 

1296 

5.14 

.642 

1.66 

.416 

5000 

4.98 

.623 

1.64 

.410 

10000 


4.91 


.615 


1.63 


.408 


*  tu( 1)  =  .0634  msec 

*  rl  =  10.8  msecs 

*  n  =  1050  satellites 

The  value  1050  was  chosen  since  approximately  15  percent  of  the  7000  satellites 

propagated  a  day  at  the  AFSPACECOM  are  deep  space.  Figures  5.9  and  5.10  indicate 
the  estimates  of  t(p),  Sp,  and  Ep  using  4  to  1024  processors.  Using  this  model  PSDP4 

is  capable  of  achieving  a  maximum  efficiency  of  about  .9  using  a  6  dimensional 
hypercube  (64  nodes). 

4.  Comparing  the  Parallel  Algorithms  for  PPT2,  SGP,  SGP4,  and  SDP4 

Before  comparing  the  four  models,  a  few  parameter  values  need  to  be  presented 
from  Phipp's  assessment  of  PPT2.  The  time  to  propagate  a  single  satellite,  rl,  using  the 
parallel  version  of  PPT2  is  1 1.2  milliseconds.  The  maximum  efficiency  achieved  on  an  8 
node  hypercube  was  .67.  Using  4  nodes,  the  maximum  efficiency  was  .45.,  The  number 
of  satellites  propagated  ranged  from  12  to  20736.  Using  the  same  model  presented  in 
this  thesis,  derived  by  Phipps  (1992),  to  predict  program  performance  versus  hypercube 
dimension  with  the  following  substitutions: 

*  tu  (1)  =  .693  msec 

*  rl  =  11.2  msecs 

*  n  =  1728  satellites, 

the  parallelized  version  of  PPT2  is  capable  of  achieving  near  .9  efficiency  using  a  4 
dimensional  hypercube  (16  nodes). 

One  observation  that  can  be  obtained  in  comparing  these  four  models,  using  the 
experimental  data,  is  based  on  the  execution  time  to  propagate  a  single  satellite,  rl,  and 
the  resulting  efficiencies  for  an  eight-node  and  a  four-node  hypeicube.  Table  5.4  shGws 
an  increase  in  efficiency  for  both  the  eight-node  and  the  four-node  hypercube  with  an 
increase  in  satellite  propagation  time,  rl.  This  conclusion  is  not  surprising  since  an 
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gure  5.10  Estimated  Execution  Time  of  PSDP4  for  Various  Hypercube  Sizes 


Figure  5.11  Estimated  Speedup  and  Efficiency  of  PSDP4  for  Various  Hypercube  Sizes 
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TABLE  5.4  PERFORMANCE  COMPARISON 


Satellite  1 

Motion  Model  j  ^  (msecs) 

Maximum  E% 

Maximum  £4 

SGP 

4.6 

.56 

.38 

SGP4 

6.6 

.60 

.40 

SDP4 

10.8 

.64 

.43 

PPT2 

11.2 

.67 

.45 

I 
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increase  in  execution  time  decreases  the  communication  to  computation  ratio.  This 
concept  will  be  analyzed  further  in  the  next  chapter. 
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VL  FURTHER  ANALYSIS  OF  PHIPPS’  DOMAIN  DECOMPOSITION  MODEL 


The  analysis  in  this  chapter  will  only  be  concerned  with  SGP4  and  SDP4  since  these 
models  are  currently  used  operationally  by  the  AFSPACECOM.  Also,  the  resulting 
conclusions  can  be  applied  to  SGP,  since  like  SGP4  and  SDP4,  SGP  is  an  analytic  model 
with  a  comparable  execution  time.  In  the  previous  chapter  Phipps'  (1992)  domain 
decomposition  model  was  used  to  determine  the  optimal  dimension  hypercube  for  PSGP, 
PSGP4,  and  PSDP4.  In  this  analysis  the  value  of  tl,  the  time  to  propagate  a  single 
satellite,  represented  only  one  call  to  the  satellite  motion  subroutine  resulting  in  one  set 
of  output  for  a  specified  time  beyond  epoch.  This  chapter  will  consider  the  effect  of 
several  calls  to  the  subroutine  per  satellite,  which  is  more  realistic  operationally.  Each 
call  corresponds  to  a  specified  time  beyond  epoch  producing  an  output  consisting  of  a 
position  vector  and  velocity  vector. 

Another  factor  this  chapter  will  address  is  the  effects  of  the  input  node  and  output 
node  (i.e„  the  reading  and  writing  portions  of  the  program)  on  the  program  performance. 
The  model  developed  by  Phipps  (1992)  to  assess  the  performance  of  his  domain 
decomposition  strategy  did  not  include  timing  the  input  node  or  output  node.  The 
iPSC/2  hypercube  at  the  Naval  Postgraduate  School  is  not  capable  of  concurrent  reading 
or  writing  from  the  disk  by  its  processors.  In  other  words,  only  one  node  can  access  the 
disk  at  any  given  time.  The  hardware,  however,  does  exist  to  give  the  hypercube  this 
capability.  Assuming  his  domain  decomposition  strategy  will  eventually  be  applied  to  a 
hypercube  with  this  hardware  or  possibly  another  type  of  parallel  mac.iine  with 
concurrent  disk  access  capabilities,  Phipps  chose  not  to  investigate  the  effects  of  the  input 
node  and  output  node  at  this  phase  of  his  research. 


For  the  sake  of  completeness  this  chapter  develops  a  model  for  Phipps'  domain 
decomposition  strategy  using  the  iPSC/2  hypercube  at  the  Naval  Postgraduate  School 
which  includes  the  effects  of  the  input  and  output  nodes.  The  model's  validity  is 
obtained  by  comparing  some  of  its  theoretical  program  times  with  the  corresponding 
experimental  times.  It  is  assumed  the  hypercube  has  its  current  capability  where  only 
one  node  can  access  the  disk  at  any  given  time.  The  objective  in  analyzing  this  new 
model  is  to  measure  the  performance  of  the  entire  domain  decomposition  strategy 
developed  by  Phipps  (1992)  when  applied  to  the  iPSC/2  hypercube  at  the  Naval  Post 
Graduate  School,  and  use  this  assessment  to  further  improve  the  parallelization  strategy 
for  SGP4,  SDP4,  and  PPT2. 

A.  ASSESSMENT  OF  SEVERAL  SUBROUTINE  CALLS  PER  SATELLITE 

Here  Phipps'  model  will  be  used  to  determine  the  optimal  dimension  hypercube  for 
SGP4  and  SDP4  assuming  a  reasonable  number  of  subroutine  calls  for  each  satellite 
motion  model.  The  number  of  subroutine  calls  were  chosen  based  on  estimates  received 
from  the  AFSPACECOM  in  Colorado  Springs. 

1.  Assessment  of  SGP4 

SGP4  propagates  data  for  low  earth  satellites  which  require  more  frequent 
tracking  than  deep  space  satellites.  Thus,  a  relatively  large  number  of  observations  are 
received  per  day  by  the  AFSPACECOM  for  each  low  earth  satellite  resulting  in  several 
calls  to  the  SGP4  subroutine.  Most  likely  the  number  of  subroutine  calls  will  fall 
between  50  and  100.  An  average  value  of  75  will  be  used  in  Phipps'  model. 

All  parameter  values  will  remain  the  same  as  presented  in  the  assessment  of 
SGP4  in  the  previous  chapter  except  for  tl,  the  time  to  propagate  a  single  satellite.  Each 
time  a  new  set  of  satellite  data  is  received  by  SGP4  an  initialization  subroutine  is  called 
before  the  SGP4  main  subroutine  is  called  (i.e.,  the  subroutine  that  produces  the  position 
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and  velocity  vectors  for  the  specified  time  since  epoch).  For  every  other  incremented 
time  specified  for  the  same  satellite,  the  initialization  program  is  not  called.  Thus,  the 
execution  times  for  these  specified  times  are  slightly  less  than  the  computation  time  it 
takes  for  the  first  time  value  specified.  Let  these  shorter  execution  times  which  are  of 
equal  value  be  represented  by  the  parameter  ts,  and  let  tf  denote  the  first  execution  time. 
Let  m  represent  the  number  of  subroutine  calls  per  satellite.  Therefore,  fl  is  expressed 

by 

tl  =  tf+(m-l)ts.  (6.1) 

Thus,  rl  is  calculated  to  be 

rl  =  6.6  +  (74)(2.2)  =  169.4  msec. 

Figure  6.1  depicts  the  speedup  and  efficiency  versus  hypercube  dimension 
obtained  from  Phipps'  model  and  using  the  following  substitutions: 

*  tM  (1)  =  .0634  msecs 

*  rl  =  169.4  msecs 

*  n  -  5950  satellites 

Since  the  number  of  satellites,  n,  and  the  time  to  send  a  single  message  from  the  input 
node  to  the  output  node,  rM(l),  remains  the  same  as  for  the  m=l  case,  a  direct 
comparison  can  be  made  between  the  graphs  in  Figure  6.1  and  Figure  5.7.  Clearly,  much 
higher  speedups  are  obtainable  for  the  m=75  case.  If  a  large  speedup  is  desired  at  the 
expense  of  efficiency,  a  maximum  speedup  near  1800  is  achievable  compared  to  a 
maximum  speedup  near  100  for  the  m=l  case.  If  maximum  efficiency  is  desired  the 
m=75  case  depicts  the  potential  to  achieve  near  100%  efficiency  using  a  hypercube  of 
dimension  eight  (256  nodes)  with  a  corresponding  speedup  near  250,  compared  to  the 
m=l  case  where  the  maximum  efficiency  is  also  nearly  100%  but  corresponding  to  a 
speedup  just  over  60  using  a  hypercube  of  dimension  six  (64  nodes).  The  above 


Figure  6.1  Estimated  Speedup  and  Efficiency  of  PSGP4  for  Various  Hypercube  Sizes 


assessment  is  expected  since  a  higher  m  yields  an  increase  in  the  computation  to 
communication  ratio. 

2.  Assessment  of  SDP4 

SDP4  propagates  data  for  deep  space  satellites  where  the  subroutine  calls  are 
likely  to  fall  between  1  and  50.  An  average  value  of  25  will  be  used  in  Phipps'  model. 
Thus,  t\  is  calculated  to  be 

10.8  +  (24)(4)  =  106.8  msecs. 

Using  the  above  value  for  rl  and  the  same  substitutions  for  tM  (1)  and  n  as  for  the  m= 1 
case, 

*  tu  (1)  =  .0634  msecs 

*  n  =  1050  satellites. 

Figure  6.2  depict  the  speedup  and  efficiency.  In  comparing  Figure  6.2  with  Figure  5.11 
(the  m=l  case),  much  higher  speedups  are  observed  with  the  m=25  case  yielding  a 
maximum  speedup  near  650  compared  to  a  maximum  speedup  near  150  for  the  m=l 
case.  Considering  maximum  efficiency,  the  m= 25  case  has  the  potential  to  achieve  near 
100%  efficiency  using  a  hypercube  of  dimension  seven  (128  nodes)  with  a  corresponding 
speedup  of  about  125.  Whereas  for  the  m= 1  case  the  maximum  efficiency  is  achieved 
with  a  hypercube  of  dimension  six  (64  nodes''  and  a  corresponding  speedup  near  60. 
Again,  these  results  are  expected  due  to  the  increased  computation  time  with  a  higher  m. 


Figure  6.2  Estimated  Speedup  and  Efficiency  of  PSDP4  for  Various  Hypercube  Sizes 


Notice,  the  speedup  achievable  is  not  as  high  as  for  PSGP4,  because  the  number  of 
subroutine  calls  per  satellite  is  one-third  that  of  PSGP4  (i.e.,  25  calls  compared  to  75 
calls). 

B.  REVISING  PHIPPS’  MODEL  TO  INCLUDE  READS  AND  WRITES 

Reading  and  writing  to  a  disk  is  extremely  costly  in  terms  of  computer  time.  As  will 
be  observed  in  assessing  this  revised  model  for  SGP4  and  SDP4,  the  initial  access  to  the 
disk  by  the  input  node  or  output  node  produces  a  relatively  large  overhead  for  each  of  the 
programs.  If  the  calculation  portion  of  the  satellite  motion  program  isn't  large  enough  in 
terms  of  computer  time  relative  to  the  initial  disk  access  time  as  well  as  the  time  to 
continue  reading  or  writing  from  or  to  the  disk  after  initial  access,  the  parallel  algorithm 
performance  suffers  greatly.  This  section  of  Chapter  VI  develops  a  model  for  Phipps' 
domain  decomposition  strategy  which  includes  the  effects  of  the  input  and  output  node, 
assuming  the  use  of  the  iPSC/2  hypercube  where  concurrent  disk  access  is  unavailable. 

1.  Revised  Model 

Since  the  output  node  is  always  the  last  node  to  finish  in  Phipps'  domain 
decomposition  algorithm,  the  program  total  time,  t(p),  can  be  determined  by  the  sum  of 
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the  output  node's  total  wait  time,  tw,  and  the  total  time  to  write  to  the  disk,  td. 
Therefore,  the  total  time,  t{p),  is  represented  by 

t(p)  =  tw+td.  (6.2) 

The  total  time  to  write  to  the  disk,  td,  can  be  modeled  by 

td  =  td,+  ( n)(m)(tda )  (6.3) 

where 

td  ~  initial  disk  access  time  which  includes  time  to  write  labels  for  output  columns 

tda-  time  to  write  data  output  (position  and  velocity  vector)  for  each  time  specified 
(i.e.,  each  time  specified  corresponds  to  a  subroutine  cal)) 

n  =  number  of  satellites 

m  =  number  of  subroutine  calls  per  satellite. 

The  total  wait  time,  tw,  can  be  broken  in  two  areas,  the  initial  wait  time,  twit  the 

output  node  must,  wait  before  it  can  initially  access  the  disk  (i.e.,  assuming  the  input  node 

(node  zero)  accesses  the  disk  first  which  has  been  observed  to  be  more  likely),  and  the 

total  sum  of  the  remaining  wait  times,  twa.  The  initial  wait  time,  tw,,  is  the  total  time  for 

all  the  satellites  to  be  read  in  by  node  zero.  Thus,  tw,  can  be  modeled  by 

tw,  =tr,+(n-  l)rra  (6.4) 

where 

tr,  =  time  for  node  zero  to  initially  access  the  disk  which  includes  reading  the  first 
satellite  data  set 

tra  =  time  to  read  each  satellite  data  set  after  the  first  one. 

The  total  sum  of  the  remaining  wait  times,  twa,  depends  on  the  time  to  propagate  a  single 
satellite,  rl,  by  each  of  the  working  nodes  and  the  number  of  working  nodes.  Since  the 
initial  disk  access  time,  td,,  far  exceeds  rl  for  SGP4  and  SDP4,  even  with  considering  75 
subroutine  calls  for  SGP4  and  25  subroutine  calls  for  SDP4,  there  is  no  wait  time  by  the 
output  node  between  completing  its  initial  disk  access  and  writing  its  first  set  of  output 
data.  For  all  subsequent  satellite  data  sets  the  wait  time  is  also  zero  since  rl,  the  time  to 


propagate  the  output  for  a  single  satellite,  is  found  to  be  less  than  the  time  to  write  the 
satellite's  output  which  may  be  modeled  by 

(m)(tda)  (6.5) 

for  each  of  the  satellite  programs  SGP4  and  SDP4.  This  also  implies  using  any  more 
than  two  working  processors  will  not  improve  program  performance,  therefore  p,  the 
number  of  processors,  will  equal  four  for  this  model. 

Combining  Equations  6.2  -  6.5  the  total  program  time  for  PSGP,  PSGP4,  and 
PSDP4  can  be  modeled  by 

t(  4)  =  tri  +  (n~  1  )tra  +  td,  +  ( n)(m)tda .  (6.6) 

Combining  Equations  6.1  -  6.5,  the  time  to  execute  SGP4  and  SDP4  using  one  processor, 
r(l),  is  expressed  by 

r(l)  =  rr,  +  (n  - 1  )tra  +  n[tf  +  (m-  l)tt]+ td,  +  ( n)(m)tda .  (6.7) 

Thus,  using  Equations  4.2  ,4.3,  6.6,  and  6.7  the  speedup  and  efficiency  using  four 
processors  may  be  modeled  by 

S  -  ^  ^  +(”-!)**•«  +n[tf  +  (m-l)ts]  +  td,  +(n)(m)tda 

4  r(4)  trt  +  (n  -  l)rrB  +  td,  +  ( n)(m)tda 

(6.8) 

E  -  -  tr'  +(n~Vtra  +n\tf  +  (m-l)ts]  +  td,  +(n)(m)tda 

4  4[trt  +(n  —  l)rra  +  tdt+  (n)(m)tda] 

During  the  development  of  this  model,  consideration  was  given  to  the  scenario  in 
which  the  input  node  reads  only  one  satellite  data  set  (or  a  small  portion  of  the  entire 
latch  of  satellite  data  sets),  distribute  to  a  working  node  (or  working  nodes)  and  then 
read  the  next  satellite  data  set  (or  next  portion  of  satellite  data  sets),  then  distribute,  etc., 
vice  reading  the  entire  batch  of  satellite  data  sets  and  then  distributing.  Since  the 
restriction  still  exists  where  only  one  of  the  input  or  output  nodes  can  access  the  disk  at 
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any  given  time  while  the  other  must  wait,  the  results  with  this  revised  method  of  reading 
the  disk  would  not  present  a  significant  difference  in  performance. 

a.  Assessment  of  SGP4 

Table  6. 1  gives  values  for  speedup  and  efficiency  for  various  n  and  m  values 
(i.e.,  number  of  satellites  and  number  of  subroutine  calls  per  satellite)  using  the  revised 
model  and  the  following  substitutions  obtained  experimentally: 

*  trt=  514  msecs  *  ts  =  2.2  msecs 

*  tra  =  18.2  msecs  *  tdt  =  542  msecs 

*  tf  =  6.6  msecs  *  tda  =  8.85  msecs 

The  single  read  and  write  times  (tra  and  tda)  varied  with  the  number  of  satellites  and  the 
number  of  subroutine  calls  per  satellite  (n  and  m).  The  read  time  per  satellite  increased 
somewhat  as  the  number  of  satellites,  n,  increased  (values  ranging  from  16.4  to  22.7 
milliseconds),  so  an  average  of  18.2  milliseconds  is  used  for  tra.  The  write  time  for  each 

subroutine  call  decreased  somewhat  as  the  number  of  subroutine  calls  per  satellite,  m, 
increased  (values  ranging  from  12.1  down  to  7.73  milliseconds),  thus  an  average  value  of 
8.85  milliseconds  is  used  for  tda.  Table  6.1  indicates  minimal  variance  in  speedup  or 
efficiency  with  changes  in  m  or  n. 

The  speedup  and  efficiency  were  obtained  experimentally  using  four  and  eight 
nodes  for  the  m= 5  case  for  various  numbers  of  satellites,  and  are  given  in  Table  6.2. 
These  actual  efficiencies  and  speedups  indicate  that  the  estimated  performance  values 
listed  in  Table  6.1  are  low,  but  still  reasonable.  Table  6.2  reinforces  the  observation  that 
more  than  four  nodes  does  not  improve  performance,  in  fact  the  contrary  is  true. 

The  speedups  and  efficiencies  depicted  both  theoretically  and  experimentally 
using  four  nodes  are  not  bad  considering  there  are  only  two  working  nodes.  But, 


TABLE  6.1  PSGP4  ESTIMATED  PERFORMANCE 


Number 

of 

Satellites 

(n) 

Number  of  Subroutine  Calls  Per  Satellite  (m) 

1 

5 

25 

75 

100 

Kfl 

H 

Kfl 

19 

Kfl 

19 

n 

91 

H 

19 

144 

1.19 

.298 

1.22 

.305 

1.24 

.310 

1.25 

.313 

1.25 

.313 

500 

1.23 

.308 

1.24 

.310 

1.25 

.313 

1.25 

.313 

1.25 

.313 

1050 

1.24 

.310 

1.24 

.310 

1.25 

.313 

1.25 

.313 

1.25 

.313 

5950 

1.24 

.310 

1.25 

.313 

1.25 

.313 

1.25 

.313 

1.25 

.313 

TABLE  6.2  PSGP4  ACTUAL  PERFORMANCE 


Number  of 

Four  Nodes 

Eight  Nodes 

Satellites  (n) 

s. 

EP 

12 

1.30 

.325 

1.03 

.128 

144 

1.66 

.411 

1.59 

.199 

500 

1.62 

.406 

1.61 

.201 

1296 

1.60 

.401 

1.59 

.199 

obviously,  this  configuration  is  not  desii  •  SGP4  since  the  performance  is  limited 
to  what  can  be  achieved  using  four  nodes. 
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b.  Assessment  of  SDP4 

Table  6.3  gives  values  for  speedup  and  efficiency  for  various  n  and  m  values 
using  the  revised  model  with  the  following  substitutions  obtained  experimentally: 

*  tf;  =  514  msecs  *  ts=4msecs 

*  tra-  18.2  msecs  *  tdt  =  542  msecs 

*  tf  =  10.8  msecs  *  tda-  8.85  msecs 

Again  minimal  variance  in  speedup  or  efficiency  is  observed  with  changes  in  m  or  n. 

The  speedup  and  efficiency  were  obtained  experimentally  using  four  and  eight 
nodes  for  the  m- 5  case  for  various  numbers  of  satellites,  and  are  given  in  Table  6.4. 
These  actual  efficiencies  and  speedups  indicate  again  that  the  estimated  performance 
values  listed  in  Table  6.3  are  low,  but  still  reasonable.  Table  6.4  also  reinforces  the 
observation  that  more  than  four  nodes  does  not  improve  performance. 

The  performance  of  SDP4  yields  better  results  than  SGP4  since  SDP4  has  a 
higher  propagation  time  per  satellite,  but  this  configuration  for  SDP4  is  still  limiting 
since  it  is  restricted  to  four  nodes. 

c.  Conclusion 

Applying  Phipps'  domain  decomposition  strategy  to  SGP4,  SDP4,  and  most 
certainly  SGP  (i.e.,  since  SGP  has  an  even  smaller  propagation  time  per  satellite),  on  the 
iPSC/2  hypercube  in  the  Mathematics  Department  at  the  Naval  Postgraduate  School 
which  does  not  have  concurrent  disk  access  capability  has  limited  performance 
capabilities  (i.e.,  restricted  to  the  use  of  four  nc  Most  likely  this  conclusion  will  also 

hold  for  PPT2.  But,  there  is  great  potential  fo»  ,  '  domain  decomposition  strategy  if 

applied  to  a  parallel  computer  that  has  the  ability  »•  perform  concurrent  disk  access.  The 
more  processors  that  can  access  the  disk  simultaneously  will  likely  result  in  more 
improved  efficiencies  and  speedups. 


TABLE  6.3  PSDP4  ESTIMATED  PERFORMANCE 


Number  of  Subroutine  Calls  Per  Satellite  (m) 

Number 

of 

Satellites 
(n) 


144 

1.32 

.330 

1.38 

.345 

1.43 

.358 

1.45 

.363 

1.45 

.363 

500 

1.37 

.343 

1.41 

.353 

1.44 

.360 

1.45 

.363 

1.45 

.363 

1050 

1.39 

.348 

1.42 

.355 

1.44 

.360 

1.45 

.363 

1.45 

.363 

5950 

1.40 

.350 

1.43 

.358 

1.45 

.363 

1.45 

.363 

1.45 

.363 

TABLE  6.4  PSDP4  ACTUAL  PERFORMANCE 


Number  of 


Satellites  (n) 


Four  Nodes 
(m  =  5) 


Eight  Nodes 
(m  =5) 


12 

1.32 

.331 

1.07 

.134 

144 

1.80 

.451 

1.74 

.218 

500 

1.74 

.435 

1.73 

.216 

1296 

1.69 

.423 

1.69 

.211 

o 


m  CONCLUSIONS  AND  SUGGESTIONS  FOR  FURTHER  RESEARCH 


The  overall  objective  of  this  thesis  was  to  determine  the  parallel  computing  potential 
of  the  AFSPACECUM  satellite  motion  models.  The  results  given  in  Chapter  V  and 
Chapter  VI  indicate  that  Phipps'  domain  decomposition  strategy  holds  promise, 
especially  if  applied  to  a  hypercube  with  concurrent  disk  access  capability.  This 
algorithm  is  simple  to  apply.  It  provides  the  flexibility  to  vary  the  dimension  of  the 
hypercube  and  makes  modifications  to  the  satellite  motion  models  easy  to  implement 

When  omitting  the  effect  of  the  input  and  output  nodes,  a  maximum  efficiency  of 
only  .56  was  achieved  for  SGP,  .60  for  SGP4,  and  .64  for  SDP4,  but  the  potential 
efficiency  is  artificially  bounded  by  the  number  of  nodes  available  with  the  specific 
hypercube  used.  Having  a  maximum  of  only  eight  nodes  available,  which  implies  six 
working  nodes,  the  efficiency  of  the  domain  decomposition  algorithm  is  bounded  above 
by  .75.  Using  Phipps'  domain  decomposition  model,  it  was  shown  that  a  maximum 
efficiency  near  100%  could  be  achieved  for  the  m=l  case  (i.e.,  one  subroutine  call  per 
satellite)  with  a  hypercube  of  128  nodes  for  PSGP  and  a  hypercube  of  64  nodes  for 
PSGP4  and  PSDP4.  In  comparison,  die  parallelized  version  of  PPT2  achieves  a 
maximum  efficiency  at  16  nodes.  In  the  case  of  higher  m  values  (25  for  PSDP4  and  75 
for  PSGP4)  much  higher  speedups  are  obtainable  yielding  a  maximum  speedup  of  1800 
for  PSGP4  and  650  for  PSDP4,  but  at  lower  efficiencies.  Also,  these  higher  m  values 
produced  maximum  efficiencies  near  100%  with  a  hypercube  of  256  nodes  for  PSGP4 
and  a  hypercube  of  128  for  PSDP4. 

Including  the  effects  of  the  input  and  output  nodes  significantly  limits  the 
performance  capabilities  of  PSGP,  PSGP4,  and  PSDP4  when  applied  to  a  hypercube 
without  concurrent  disk  access  capability.  Here  it  was  discovered  that  the  performance  is 


restricted  to  what  can  be  achieved  using  four  nodes,  which  creates  an  upper  bound  of  .5 
for  the  efficiency  corresponding  to  a  speedup  of  two.  Experimentally,  a  maximum 
efficiency  of  .41  was  achieved  with  PSGP4  and  .45  with  PSDP4. 

In  conclusion,  applying  Phipps'  domain  decomposition  strategy  to  a  hypercube  with 
concurrent  disk  access  capability  could  result  in  speedup  factors  that  would  significantly 
reduce  the  time  to  predict  state  vectors  for  several  thousand  satellites. 

The  success  in  applying  Phipps'  domain  decomposition  strategy  to  SGP,  SGP4, 
SDP4,  and  PPT2  using  the  iPSC/2  hypercube  at  the  Naval  Postgraduate  School  creates 
several  possibilities  for  future  research.  One  area  of  research  would  involve  applying  the 
parallelized  versions  of  SGP,  SGP4,  SDP4,  and  PPT2  to  higher  dimension  hypercubes 
that  have  concurrent  disk  access  capabilities  and  validate  the  estimates  presented  by 
Phipps'  domain  decomposition  model,  thus  omitting  the  effects  of  the  input  and  output 
nodes.  Then  investigate  the  performance  obtainable  using  the  concurrent  disk  access 
capabilities  which  will  be  dependent  upon  the  number  of  nodes  that  can  access  the  disk 
simultaneously. 

Another  possible  area  of  research  would  involve  modifying  the  current  satellite 
motion  models  to  increase  the  accuracy  of  their  predictions.  The  results  in  Chapters  V 
and  VI  showed  an  increase  in  performance  if  the  amount  of  computation  was  increased. 
Thus,  greater  accuracy  could  be  achieved  in  far  less  time  using  Phipps’  domain 
decomposition  algorithm  for  SGP,  SGP4,  SDP4,  and  PPT2  than  the  time  using  the 
original  serial  algorithms.  Also,  from  these  results,  applying  this  algorithm  to  satellite 
motion  models  that  are  more  computationally  intensive,  such  as  semi-analytic  models, 
would  yield  greater  parallel  computing  potential. 

Investigating  other  methods  of  parallelization  presents  another  wide  open  area  of 
research.  One  method  of  interest  involves  the  use  of  Parallel  Virtual  Machines  (PVM). 
This  software  provides  a  means  to  make  a  group  of  work  stations  act  like  a  parallel 
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computer.  Parallelizing  through  PVM  is  appealing,  since  work  stations  have  already 
proven  to  be  quite  invaluable  for  the  day  to  day  tasks  in  business  and  academic 
organizations  due  to  their  versatility,  affordability,  and  performance  capabilities,  and 
having  the  added  ability  of  parallel  computing  opens  up  new  areas  of  computer  power. 
With  the  growing  pace  in  parallel  computer  technology,  research  in  applying  satellite 
propagation  models  to  the  more  recently  developed  parallel  computers  can  lead  to 
tremendous  breakthroughs. 

In  conclusion,  the  results  of  Phipps'  (1992)  thesis  combined  with  the  results  from 
this  thesis  clearly  shows  that  satellite  position  prediction  can  be  made  more  timely 
through  parallel  computing.  Although  the  best  method  of  parallelization  may  vary 
depending  on  the  specific  model  used  as  well  as  the  type  of  parallel  computer  used, 
parallel  computing  presents  one  sure  avenue  to  achieve  timely  satellite  position 
prediction  for  the  growing  number  of  earth  orbiting  objects. 


APPENDIX  A 


INTEL  iPSC/2  SPECIFICATIONS 

This  appendix  contains  a  summary  of  the  iPSC/2  hypercube  multcomputer 
specifications  as  described  in  (iPSC/2  User's  Guide,  1990,  pp.  1-1  -  1-11).  The  exact 
performance  values  were  obtained  from  (Arshi,  1988,  pp.  17-22). 

iPSC/2 


Svstem  Resource  Manager  (Host) 

Central  Processing  Unit 

INTEL  80386  (4  MIP) 

Numeric  Processing  Unit 

INTEL  80387  (250  KFLOP  64-bit) 

Memory 

8  Mbyte 

External  Communication 

Ethernet  TCP/IP  local  area  network  port 

Operating  System 

AT&T  Unix,  Version  V,  Release  3.0 

Nodes 

Node  Processor 

INTEL  80386  (4  MIP) 

Numeric  Co-processor 

INTEL  80387  (250  KFLOP  64-bit) 

Operating  System 

NX/2 

Internal  Communication 

Direct-Connect™  Routing 

Message  Latency  —  350  jisec 

Message  Bandwidth  -  2.8 


APPENDIX  B 


SOURCE  CODE  LISTING 

This  appendix  contains  a  listing  of  the  host  and  node  programs  for  the  parallelized 
version  of  SGP,  and  SGP4/SDP4.  Before  each  program  set  the  input  parameters  are 
listed.  Recall,  SGP4  and  SDP4  are  two  separate  subroutines  that  are  part  of  the  same 
program.  The  host  program  for  each  satellite  motion  model  loads  the  node  program  and 
clears  the  nodes  once  the  process  is  complete.  The  node  program  contains  the 
instructions  for  the  nodes  to  complete  their  respective  portions  of  the  parallel  algorithm. 
The  node  program  assigns  Node  0  to  be  the  distributing  node,  the  highest  numbered  node 
to  be  the  collecting  node,  and  the  remaining  nodes  to  be  the  working  nodes.  The 
working  nodes  execute  the  original  SGP,  or  SGP4/SDP4  subroutine  with  only  minor 
modifications. 


Input  Parameters 

Parameter 

iept 

ts 

tf 

aelt 

epoch 

xndt2o 

xndd6o 


SGP 


Description 
End  of  file  indicator 

Start  Line  (first  propagation  time  in  minutes) 

Stop  time  (last  propagation  time  in  minutes) 

Time  increment  in  minutes 
(determines  number  of  propagation  times) 

Epoch  reference  time  (year.day) 

Time  derivative  of  mean  motion  divided  by  two 


Rev 
Day 2 


Second  time  derivative  of  mean  motion  divided  by  six 


Rev 
Day3 ) 


xincl 


Orbital  inclination  (degrees) 


xnodeo  Right  ascension  of  node  (degrees) 

eo  Eccentricity  (degrees) 

omegao  Argument  of  perigee  (degrees) 

xmo  Mean  anomaly  (degrees) 

u  ..  f  Rev> 
xno  Mean  motion  - 

KDay) 

There  are  a  total  of  13  parameters.  Using  double  precison  (i.e.,  8  bytes  per 
parameter),  this  forms  a  message  size,  from  input  node  to  a  working  node,  of  104  bytes. 
Host  Program: 


program  PSGPh 

*  This  host  program  loads  the  node  program  PSGPn  on  the 

*  nodes  of  the  attached  hypercube.  Upon  completion  of  the 

*  catalog  of  satellite  data,  the  program  clears  the  nodes  for 

*  another  process. 

*  Set  host  specific  parameters 

data  pid  /  0  / 

*  Set  process  id 

cal!  setpid  (pid) 

*  Load  program  PSGPn  on  the  nodes 

print  *,  'loading  nodes 
call  load  ('psgpn',  -1,  pid) 

*  Receive  message  that  nodes  are  complete 

call  crecv  (99,  istop,  4) 
print  *,  'nodes  complete’ 

*  Kill  process  on  nodes 


call  kilicube  (-1,  -1) 


program  PSGPn 


*  TLi;  program  propagates  n-2  satellites  concurrently,  where  n  is 

*  the  number  of  nodes  belonging  to  the  attached  cube.  Node  0  is 

*  the  distributing  node  and  the  Node  n-1  is  the  collecting  node. 

*  The  remaining  nodes  are  the  working  nodes  that  propagate  the 

*  satellites  using  AFSPACECOM  subroutine  SGP.  For  simplicity, 

*  the  tasks  for  all  nodes  are  combined  on  this  one  node  program. 

*  Tasks  are  partitioned  by  logical  if  statements. 

implicit  real  *  8  (a  -  h,  o  -  z) 

real  *  8  sat  (13, 25000),  f  (13),  g  (7  *  5  +  1) 

integer  hostid,  pid,  outlen 

common  /  sg  /  f  (13),  g  (7  *  5  +  1),  ig 

data  inlen  / 104  / 

data  isat  / 1  /,  n  / 1  /,  istop  / 1  /,  pid  /  0  / 
maxlen  =  8  *  (7  *  5  +  1) 

**♦  WARNING  5  IN  THE  ARRAY  G  AND  IN  THE  VALUE  MAXLEN  IS  THE  MAXIMUM 
*»’  NUMBER  OF  PROPAGATION  TIMES  FOR  A  GIVEN  SET  OF  SATELLITE  INPUT 

mynod  =  mynode( ) 
n  unui  ode  =  numnodes( ) 
hostid  =  myhost( ) 

*000000000000000000000000000000000000000000000000000000000000000000000 

*  Node  0  reads  and  distributes  data  among  the  working  nodes 

if  (mynod  .eq.  0)  then 

*  Read  complete  catalog  of  satellite  data 

open  (30,  file='satinp.r,  status='old',  form='formatted') 

10  read  (30,  *)  (sat  (j,  isat),  j  =  1,4) 
if  (sat  (1,  isat)  .eq.  0.0)  go  to  25 
read  (30, 15)  (sat  (j,  isat),  j  =  5,  10) 

15  FORMAT  (F15.8,  F10.8,  E9.5,  F8.4,  F9.4,  F9.7) 
read  (30, 20)  (sat  (j,  isat),  j  =  1 1, 13) 
if  (sat  (13,  isat)  .le.  0.0)  go  to  10 
20  FORMAT  (F8.4,  F9.4,  F12.8) 

isat  =  isat  +  1 
go  to  10 

25  close  (unit  =  30) 
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*  Send  number  of  data  sets  to  all  nodes 

isat  =  isat  - 1 

call  csend  (mynod,  isat,  4,-1,  pid) 

*  Distribute  satellite  data  sets  to  working  nodes 

itl  =  mclock( ) 
iter  =  isat  /  (numnode  -  2) 
do  1201  j  =  1,  iter 
do  1201  i «  1,  numnode  -  2 
call  csend  (mynod,  sat  (1,  n),  inlen,  i,  pid) 

1201  n  =  n+  1 

iter  =  mod  (isat,  numnode  -  2) 
if  (iter  .It.  l)go  to  1203 
n  =  n- 1 

do  1202  j  =  1,  iter 

1202  call  csend  (mynod,  sat  (1,  n  +  j),  inlen,  j,  pid) 

1203  it2  =  mclock  ( ) 

*  End  Node  0 

•000000000000000000000000000000000000000000000000000000000000000 

♦wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 

*  Begin  working  nodes 

else 

if  (mynod  ,1L  numnode  -  1)  then 

*  Receive  number  of  data  sets  from  Node  0  and  compute  total  number 

*  of  satellites  to  propagate 

call  crecv  (0,  isat,  4) 
itl  =  mclock( ) 

if  (mynod  .le.  mod  (isat,  numnode  -  2))  then 
iter  =  isat  /  (numnode  -  2)  +  1 
else 

iter  =  isat  /  (numnode  -  2) 
endif 

*  Receive  satellite  data,  execute  SGP,  and  send  results  to 

*  collecting  node 

do  1220  i  =  1,  iter 
call  crecv  (0,  f,  inlen) 
call  sgpm( ) 
outlen  =  8  *  ( 7*  ig  +  1) 

1220  call  csend  (mynod,  g,  outlen,  numnode  - 1, 0) 
it2  =  mclock( ) 


•  •  • 


•  •  • 


72 


» 


*  End  Working  Nodes 

•WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 

♦ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

*  Begin  Collecting  Node 

else 

open  (unit  =  11,  file  =  'velpos2') 

write  (11, 1235)  Tsince',  X,  T,  T,  XDOT,  TDOT, 

1235  format  (lOx,  a,  lOx,  a,  18x,  a,  18x,  a  /  26x,  a,  15x,  a,  15x,a) 

*  Receive  total  number  of  satellite  sets 

call  crecv  (0,  isat,  4) 

*  Collect  results  and  write  to  external  file  from  Working  Nodes 

itl  =  mclock( ) 
do  1240  k  =  1,  isat 
call  crecv  (-1,  g,  maxlen) 
ig  =  g(l) 

write  (unit=  11,  1245)  ((g(i),  i  =  2  +  (j  -  2)  *  7. 8  +  0  -  2)  *  7),  j  =  2 ,  ig  +  1) 

1240  continue 

it2  =  mclock( ) 

1245  format  ( /  4fl7.8  /  15x,  307.8) 
close  (unit  =  11) 

*  Send  message  to  Host  that  process  is  complete 

call  csend  (99,  istop,  4,  hostid,  pid) 

*  End  Collecting  Node 

•CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

endif 

endif 


itot  =  it2  -  itl 

print  *,  ’node# mynod, '  time  itot, '  for isat, 1  sats' 

stop 

end 


I 
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SGP4/SDP4 

Inpat  Parameters 

Here  the  input  parameters  are  read  in  a  little  differently  compared  to  SGP.  The 
reading  format  is  more  compatible  with  the  PC  version  of  SGP4/SDP4  received  from  the 
AFSPACECOM.  Also,  this  input  is  more  complete  in  terms  of  object  identification. 


Parameter 

ierdno 

satn 

yr 

rday 

xndot 


xn2dt 


le 

bterm 

ie2 

ephtyp 

icrdno2 

xinco 

xnodeo 

eo 

oraegao 

xmao 

xno 


Description 

Line  number  of  data  input 
Satellite  number 
Year 

Day  number 


Time  derivative  of  mean  motion  divided  by  two 


Rev 


Day 2 


Second  time  derb  ative  of  mean  motion  divided  by  six 


Rev 


Da? 


Exponent  of  xnd2dt 

Modified  ballistic  coefficient  (ER*1) 

Exponent  of  bterm 

Ephemeris  type 

Line  number  of  data  input 

Orbital  inclination  (degrees) 

Right  ascension  of  node  (degrees) 
Eccentricity  (degrees) 

Argument  of  perigee  (degrees) 


Mean  anomaly  (degrees) 


Mean  motion 


Rev 


iyr 


KDayj 
Year  of  start  time 


i 
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srday 

Day  number  of  start  time 

jyr 

Year  of  stop  time 

spday 

Day  number  of  stop  time 

delta 

Time  increment  in  minutes 

(Determines  number  of  propagation  times) 

There  are  a  total  of  22  parameters.  Using  double  precison  this  forms  a  message  size, 
from  input  node  to  a  working  node,  of  176  bytes. 

Host  Program: 

program  PSGP4h/PSDP4h 

*  This  host  program  loads  the  node  program  PSGP4n  on  the 

*  nodes  of  the  attached  hypercube.  Upon  completion  of  the 

*  catalog  of  satellite  data,  the  program  clears  the  nodes  for 

*  another  process. 

*  Set  host  specific  parameters 

data  pid  /  0  / 

*  Set  process  id 

call  setpiu  (pid) 

*  Load  program  PSGP4n  on  the  nodes 

print  *,  loading  nodes' 
call  load  ('psgp4n',  -I,  pid) 

*  Receive  message  that  nodes  are  complete 

call  crecv  (99,  istop,  4) 
print  *,  'nodes  complete' 

*  Kill  process  on  nodes 

call  lollcube  (-1, -1) 

stop 

end 
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program  PSGP4n 


*  This  program  propagates  n-2  satellites  concurrently,  where  n  is 

*  the  number  of  nodes  belonging  to  the  attached  cube.  Node  0  is 

*  the  distributing  node  and  the  Node  n-1  is  the  collecting  node. 

*  The  remaining  nodes  are  the  working  nodes  that  propagate  the 

*  satellites  using  AFSPACECOM  subroutine  SGP4  for  near-Earth 

*  or  SDP4  for  deep-space  objects.  For  simplicity,  the  tasks  for  all 

*  nodes  are  combined  on  this  one  node  program.  Tasks  are  par- 

*  titioned  by  logical  if  statements. 

implicit  real  *  8  (a  -  h,  o  -  z) 
real  *  8  sat  (22),  f  (22),  g  (7  *  5  +  1) 

integer  hostid,  pid,  outlen 

common  /  sg  /  f  (22),  g  (7  *  5  +  1),  ig 

data  inlen  / 176  / 

data  isat  / 1  /,  n  /I  /,  istop  / 1  /,  pid  /  0  / 
maxlen  =  8  *  (7  *  5  +  1) 

***  WARNING  5  IN  THE  ARRAY  G  AND  IN  THE  VALUE  MAXLEN  IS  THE  MAXIMUM 
***  NUMBER  OF  PROPAGATION  TIMES  FOR  A  GIVEN  SET  OF  SATELLITE  INPUT 

mynod  =  mynode( ) 
numnode  =  numnodes( ) 
hostid  =  myhost( ) 

*000000000000000000000000000000000000000000000000000000000000000000000 

*  Node  0  roads  and  distributes  data  among  the  working  nodes 

if  (mynod  .eq.  0)  then 

*  Read  complete  catalog  of  satellite  data 

open  (10,  file='satinp4.r,  status='old',  fonn='formatted') 

10  read  (10,  *)(sat(j,  isat),j  =  1,  10) 
if  (sat  (1,  isat)  .eq.  0.0)  go  to  35 
read  (10,  *)  (sat  (j,  isat),  j  =  1 1, 17) 
read  (10,  *)  (sat  (j,  isat),  j  =  18, 22) 
isat  =  isat  +  1 
35  close  (unit  =  10) 


*  Send  number  of  data  sets  to  all  nodes 
isat  =  isat  - 1 

call  csend  (mynod,  isat,  4,  -1,  pid) 
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*  Distribute  satellite  data  sets  to  working  nodes 

itl  =  mclock( ) 
iter  =  isat  /  (numnode  -  2) 
do  1201  j  =  1,  iter 
do  1201  i  =  1,  numnode  •  2 
call  csend  (mynod,  sat  (1,  n),  inlen,  i,  pid) 

1201  n  =  n  +  1 

iter  =  mod  (isat,  numnode  -  2) 
if  (iter  .lL  l)go  to  1203 
u  =  n- 1 

do  1202  j  =  1,  iter 

1202  call  csend  (mynod,  sat  (1,  n  +  j),  inlen,  j,  pid) 

1203  it2  =  mclock  ( ) 

*  End  Node  0 

*ooooooooooooooooooooooooooooooooooooooo(X)0()ooo()oooooooooooooooogo()ooo 

*\VWW\VWWWWWWWWWWWWWWWWVAVWWWVAVWWWWWWWWW 

*  Begin  wotking  nodes 

else 

if  (mynod  .lL  numnode  - 1)  then 

*  Receive  number  of  data  sets  from  Node  0  and  compile  total  number 

*  of  satellites  to  propagate 

call  crecv  (0,  isat,  4) 
itl  =  mclock( ) 

if  (mynod  .le.  mod  (isat,  numnode  -  2))  then 
iter  =  isat  /  (numnode  -  2)  +  1 
else 

iter  =  isat  /  (numnode  -  2) 
endif 

*  Receive  satellite  data,  execute  SGP4/SDP4,  and  send  results  to 

*  collecting  node 

do  1220  i  =  1,  iter 
call  crecv  (0,  f,  inlen) 
call  sgp4m( ) 
outlen  =  8  *  ( 1*  ig  +  1) 

1220  call  csend  (mynod,  g,  outlen,  numnode  - 1, 0) 
it2  =  mc!ock( ) 

*  End  Working  Nodes 

♦WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 

•ccccccccccccccccccccccccccccccccccccccccccccccccccc 
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*  Begin  Collecting  Node 

else 

open  (unit  =  12,  file  =  'velpos4')  p 

write  (11, 1235)  Tsince',  X,  T,  T,  XDOT,  YDOT,  ’ZDOT 
1235  format  (lOx,  a,  lOx,  a,  18x,  a,  18x,  a  /  26x,  a,  15x,  a,  15x,a) 

*  Receive  total  number  of  satellite  sets 

call  crecv  (0,  isat,  4)  I 

*  Collect  results  and  write  to  external  file  from  Working  Nodes 

itl  =  mclock( ) 
do  1240  k=  1,  isat 

call  creev  (-1,  g,  maxlen)  - 

ig  =  g(l) 

write  (unit  =  12,  1245)  ((g(i),  i  =  2  +  (j  -  2)  *  7,  8  +  (j  -  2)  *  7),  j  =  2 ,  ig  +  1) 

1240  continue 

it2  =  mclock( ) 

1245  fonnat  ( /  4f  17.8  /  15x,  3fl7.8) 

close  (unit  =  11)  I 

*  Send  message  to  Host  that  process  is  complete 

call  csend  (99,  istop,  4,  bostid,  pid) 

*  End  Collecting  Node  I 

•CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

endif 

endif 

itot  =  it2  -  itl 

print  *,  'node# mynod, '  time  itot,  ’  for isat,  ’  sats' 

stop 

end 

» 


I 


> 
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APPENDIX  C 


EXECUTION  TIMES 

This  appendix  contains  the  experimental  execution  times  for  the  parallel  and  serial 
versions  of  SGP,  SGP4,  and  SDP4  for  various  numbers  of  satellites.  The  parallel 
execution  times  are  for  an  eight-node  and  a  four-node  hypercube.  There  is  only  one 
subroutine  call  per  satellite,  and  the  time  to  read  and  write  is  not  included  (i.e.,  only  the 
calculational  portion  is  timed). 


SGP 

TABLE  C.1  EXECUTION  TIMES  FOR  PSGP  AND  SSGP 


Number 

PSGP 

of  1 

SSGP 

Satellites 

8  Nodes  (msec) 

4  Nodes  (msec) 

(msec) 

6 

9.70 

18.6 

26.8 

12 

15.3 

36.0 

53.0 

36 

38.9 

105 

160 

144 

143 

420 

632 

216 

213 

635 

958 

500 

489 

1470 

2190 

1296 

1280 

4090 

5750 

5000 

4890 

14700 

21900 

10000 

9800 

29500 

43900 
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SGE4 


TABLE  C.2  EXECUTION  TIMES  FOR  PSGP4  AND  SSGP4 


Number 

PSGP4 

of 

SSGP4 

Satellites 

8  Nodes  (msec) 

4  Nodes  (msec) 

(msec) 

6 

11.4 

24.3 

37.7 

12 

19.4 

47.4 

74.7 

36 

50.3 

140 

224 

144 

190 

560 

894 

216 

283 

843 

1340 

500 

653 

1980 

3110 

1296 

1700 

5340 

8050 

5000 

6840 

21000 

31100 

10000 

14000 

42400 

62100 
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|  SDE4 

TABLE  C.3  EXECUTION  TIMES  FOR  PSDP4  AND  SSDP4 


Number 

of 

Satellites 

PSDP4 

SSDP4 

(msec) 

8  Nodes  (msec) 

4  Nodes  (msec) 

6 

15.7 

36.2 

61.5 

12 

27.3 

71.3 

122 

36 

74.2 

212 

366 

144 

285 

845 

1460 

216 

426 

1270 

2190 

500 

985 

2970 

5080 

1296 

2560 

7920 

13200 

5000 

10200 

31000 

50800 

10000 

20700 

62200 

102000 

I 
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